Managing Locality in Grand Challenge Applications: A Case Study of the Gyrokinetic Toroidal Code
by Jennifer Harris — last modified 2009-04-20 10:54
Achieving high performance with grand challenge applications on today’s large-scale parallel systems requires tailoring applications for the characteristics of the modern microprocessor architectures. As part of the US Department of Energy’s Scientific Discovery through Advanced Computing (SciDAC) program, we studied and tuned the Gyrokinetic Toroidal Code (GTC), a particle-in-cell code for simulating turbulent transport of particles and energy in burning plasma, developed at Princeton Plasma Physics Laboratory. In this paper, we present a performance study of the application that revealed several opportunities for improving performance by enhancing its data locality. We tuned GTC by performing three kinds of transformations: static data structure reorganization to improve spatial locality, loop nest restructuring for better temporal locality, and dynamic data reordering at run-time to enhance both spatial and temporal reuse. Experimental results show that these changes improve execution time by more than 20% on large parallel systems, including a Cray XT4.
Proc. of the SciDAC 2008 Conference, Journal of Physics