John Mellor-Crummey (2008)
Pinpointing and exploiting opportunities for enhancing data reuse.
Proc. of the 2008 IEEE Intl. Symposium on Performance Analysis of Systems and Software.
The potential for improving the performance of
data-intensive scientific programs by enhancing data reuse in
cache is substantial because CPUs are significantly faster than
memory. Traditional performance tools typically collect or simulate
cache miss counts or rates and attribute them at the function
level. While such information identifies program scopes that
exhibit a large cache miss rate, it is often insufficient to diagnose
the causes for poor data locality and to identify what program
transformations would improve memory hierarchy utilization.
This paper describes an approach that uses memory reuse
distance to identify an application’s most significant memory
access patterns causing cache misses and provide insight into
ways of improving data reuse. Unlike previous approaches, our
tool combines (1) analysis and instrumentation of fully optimized
binaries, (2) online analysis of reuse patterns, (3) fine-grain
attribution of measurements and models to statements, loops and
variables, and (4) static analysis of access patterns to quantify
spatial reuse. We demonstrate the effectiveness of our approach
for understanding reuse patterns in two scientific codes: one
for simulating neutron transport and a second for simulating
turbulent transport in burning plasmas. Our tools pinpointed
opportunities for enhancing data reuse. Using this feedback as a
guide, we transformed the codes, reducing their misses at various
levels of the memory hierarchy by integer factors and reducing
their execution time by as much as 60% and 33%, respectively.