K. Yoshii, K. Iskra, P.C. Broekema, H. Naik, and P. Beckman (2008)
Characterizing the Performance of Big Memory on Blue Gene Linux
Proceedings of the 2nd International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2).
Using Linux for high-performance applications on the compute nodes of IBM Blue Gene/P is challenging because
of TLB misses and difficulties with programming the network DMA engine. We present a design and implementation of “big memory”—an alternative, transparent memory space for computational processes, which addresses these difficulties. The big memory uses extremely large memory pages available on PowerPC CPUs to create a TLB-missfree, flat memory area that can be used for application code and data and is easier to use for DMA operations. Singlenode benchmarks show that the performance gap narrows from more than a factor of 3 observed with a standard Linux kernel to just 0.03–0.2% with the big memory. We verify this at the scale of 1024 nodes using the NAS Parallel Benchmarks suite, finding the performance under Linux with the big memory support to fluctuate within 0.7% of the vendor microkernel. Originally intended exclusively for compute node tasks, our new memory subsystem turns out to dramatically improve the performance of certain applications on the I/O nodes as well, as demonstrated by LOFAR.