Personal tools
You are here: Home Publications The Roofline Model: A pedagogical tool for auto-tuning kernels on multicore architectures
Document Actions

S.W. Williams, D.A. Patterson, L. Oliker, J. Shalf, and K. Yelick (2008)

The Roofline Model: A pedagogical tool for auto-tuning kernels on multicore architectures

In: HOT Chips, A Symposium on High Performance Chips.

We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to sparse matrix vector multiplication (SpMV), the explicit heat equation PDE on a regular grid (Stencil), and a lattice Boltzmann application (LBMHD). We explore one of the broadest sets of multicore architectures in the highperformance computing literature, including the Intel Xeon Clovertown, AMD Opteron Barcelona, Sun Victoria Falls, and the Sony-Toshiba-IBM (STI) Cell. Rather than hand-tuning each kernel for each system, we develop a code generator for each kernel that allows us identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned kernel applications often achieve a better than 4× improvement compared with the original code. Additionally, we analyze a Roofline performance model for each platform to reveal hardware bottlenecks and software challenges for future multicore systems and applications.
Stanford, CA
by Jennifer Harris last modified 2009-04-21 10:21
« November 2017 »
Su Mo Tu We Th Fr Sa
1234
567891011
12131415161718
19202122232425
2627282930
 

Powered by Plone

CScADS Collaborators include:

Rice University ANL UCB UTK WISC