Workshop on Automatic Tuning for Petascale Systems
July 8-11, 2008, Snowbird Ski & Summer Resort, Snowbird, Utah, USA
Organizers
Kathy Yelick (University of California at Berkeley), "yelick" AT "cs.berkeley.edu"
Jack Dongarra (University of Tennessee), “dongarra” AT “cs.utk.edu”
Rich Vuduc (Georgia Institute of Technology), “richie” AT “cc.gatech.edu”
Keith Cooper (Rice University), "keith" AT "cs.rice.edu"
Abstract
Over the last decade, microprocessor features such as deep pipelines, multiple cores, and complex memory hierarchies have made it increasingly difficult to achieve good performance in scientific applications and libraries. This fact has given rise to systems that automate the tuning process, using large amounts of computation to configure the application or library for good performance on the target architecture.
This workshop, the second in the series, will bring together researchers and practitioners in automatic tuning, in library design and construction, and in compiler-based code generation to identify and discuss opportunities and challenges in the use of automatic tuning for future petascale systems. As with last year, we are actively soliciting members of the compiler and library autotuning communities. In addition, we are interested in attracting attendees whose focus is on runtime optimization of scientific programs.
The workshop will consist of a series of talks and discussion sessions. We anticipate that most workshop attendees will present a talk on their research, their insights, and their experience. The discussion sessions will focus on opportunities to build shared infrastructure, along with issues raised during the workshops.
Agenda
The emphasis of this year's workshop is on the boundaries and interactions between specialized, library-based autotuning systems and current and future compilation environments. In particular, we ask the speakers to present research talks that take a "position" on one or more of the following questions (alternatively, feel free to make up your own). It would be helpful if your first slide explicitly stated the questions/positions you are taking. We will use the open discussion periods and working dinners to tackle these questions as well.Positions and Questions
- Proposition: Today's autotuning work does/does not address the challenges of petascale. (What challenges need to be addressed?)
- Question: How do we measure success for tuning? How do we measure and weigh performance? Productivity?
- Question: What architectures/platforms should we target?
- Proposition/Question: “Parameter tuning” is the wrong focus for our area, and will lead only to incremental improvements; what problems should we look at instead?
- Proposition: Self-tuned libraries will always outperform compiler-generated code.
- Question: What improvements should we expect from autotuning, at both the compiler level and the library level? Will we allow future compilers to change data structures and algorithms?
- Proposition: Simple performance models (e.g., a cache-oblivious model, or assuming systems with simple in-order cores) will be the right models in the future and will obviate the need for empirical search.
- Proposition: The traditional boundaries between applications, libraries, compilers, and operating systems is too rigid and needs to be changed. Example: What in our app-specific schedulers could be moved into the OS, if anything?
- Question: What issues are we as a community ignoring? What other technologies should we investigate to find application-specific, platform-specific improvement?
- Question: What can we do to build common tool bases for compiler-based autotuning and for construction of self-tuning or autotuning libraries?
- Proposition: The focus on specialized tuning systems is too narrow, and so only compilers, which apply most broadly, are the most sensible investment.
- Proposition: Runtime optimization will catch opportunities for improvement that neither a compiler nor a neither an autotuned library can.
- Question: Suppose all layers of the software stack (e.g., OS, middleware, MPI, libraries, apps) are “autotuned.” Will we need to integrate these multiple layers, and if so, how?
Day 1 - Tuesday July 8
Morning- 8:30-9:00am: Continental breakfast
- 9:00-9:15am: Keith Cooper (Rice) and Rich Vuduc (Georgia Tech) "Welcome"
Session: Architectural Road Maps - 9:20-9:50am: Josh Fryman (Intel). "Petascale and parallel programming: Everything you learned in kindergarten is wrong"
- 9:50-10:20am: Ben de Waal (NVIDIA). "NVIDIA GPU"
- 10:20-10:50am: Paul Henning (LANL). " Roadrunner and the future of applications programming"
- 10:50-11:20am: Mattan Erez (UT Austin). "Autotuning for petascale: An architect's perspective"
- 11:20am-12:00pm: Group open discussion period
Afternoon
- 12:00pm-1:30pm: Lunch (on your own)
Session: Industrial Research Perspectives - 1:30-2:00pm: Charles Fu (Microsoft). "The numerical libraries project in the Microsoft Incubation Group"
- 2:00-2:30pm: Henry Gabb (Intel). "The Intel adaptive Spike-based solver: Using software adaptation to achieve performance"
- 2:30-3:00pm: Keita Teranishi (Cray). "CASK: Cray Adaptive Sparse Kernels"
- 3:00-3:30pm: Kevin and Kathryn O'Brien (IBM). "OpenMP compilation for the Cell processor"
- 3:30-4:00pm: Break
Session: Libraries, Compilers, and Run-time Systems (I) - 4:00-4:30pm: James Demmel (UC Berkeley). "Recent progress in autotuning at Berkeley"
- 4:30-5:00pm: Markus Pueschel (CMU). "Spiral: Automating library development"
- 5:00-5:45pm: Open discussion
Evening
- 6:00-8:00pm: Working buffet dinner.
Discussion topics: autotuning opportunities, successes to date, remaining challenges, and promising approaches.
Day 2 - Wednesday July 9
- 8:30-9:00am: Continental breakfast
Session: Libraries, compilers, and run-time systems (II) - 9:00-9:30am: P. Sadayappan (OSU). "A polyhedral loop transformation framework for parallelization and tuning"
- 9:30-10:00am: J. Ramanujam (LSU). "Toward automatic parallelization and auto-tuning of affine kernels for GPUs"
- 10:00-10:10am: Break
- 10:10-10:40am: Dan Hoeflinger and David Padua (UIUC). --cancelled--
- 10:40-11:10am: Chun Chen (USC/ISI). "Compiler autotuning and supporting tools"
- 11:10-11:40am: John Cavazos (UDel). "Intelligent compilation"
- 11:40-12:25pm: Open discussion.
Afternoon
- 12:25-2:00pm: Lunch (on your own)
Session: Libraries, compilers, and run-time systems (III) - 2:00-2:30pm: Clint Whaley (UT San Antonio). Achieving accurate and context-sensitive timing for code optimization, or how do we measure success for tuning and performance?"
- 2:30-3:00pm: Yevgen Voronenko (CMU). "General-size library generation with Spiral"
- 3:00-3:30pm: Milind Kulkarni (UT Austin). "The Galois project"
- 3:30-3:45pm: Break
- 3:45-4:15pm: Jakub Kurzak (UTK). "Modeling and tuning parallel performance in dense linear algebra"
- 4:15-4:45pm: Jeff Hollingsworth (UMD). "Experience with automated performance tuning using Active Harmony"
- 4:45-5:15pm: Rudi Eigenmann (Purdue). "Towards dynamically adaptive programs, a.k.a., autotuning"
- 5:15-5:45pm: Open discussion
- 6:00-8:00pm: Working buffet dinner.
Discussion topics: deriving application parameters, application characterization with performance counters, using regression models to help select optimizations, and plans for moving forward with autotuning.
Day 3 - Thursday, July 10
Morning- 8:30-9:15am: Breakfast
Session: Libraries, compilers, and run-time systems (IV) - 9:15-9:45am: Martin Swany (UDel). "Compiler-based autotuning of MPI"
- 9:45-10:15am: Qing Yi (UTSA). "POET: Parameterized Optimizations for Empirical Tuning"
- 10:15-10:45am: Shoaib Kamil (UCB). "Recent results, insights, and lessons from autotuning three motifs"
- 10:45-11:15am: Bilel Hadri (UTK). "Performance and tuning for designing a fast parallel hemodynamic simulator"
- 11:15-12:00pm: Open discussion
Afternoon
- Lunch / social activity
Sponsors
This workshop was sponsored by the Center for Scalable Application Development Software, with funding from the Scientific Discovery through Advanced Computing (SciDAC) program.