Personal tools
You are here: Home Publications Performance without Pain = Productivity, Data layouts and Collectives in UPC
Document Actions

Rajesh Nishtala, George Almasi, and Calin Cascaval (2008)

Performance without Pain = Productivity, Data layouts and Collectives in UPC

In: Principles and Practices of Parallel Programming (PPoPP).

The next generations of supercomputers are projected to have hundreds of thousands of processors. However, as the numbers of processors grow, the scalability of applications will be the dominant challenge. This forces us to reexamine some of our fundamental ways that we approach the design and use of parallel languages and runtime systems. In this paper we show how the globally shared arrays in a popular Partitioned Global Address Space (PGAS) language, Unified Parallel C (UPC), can be combined with a new collective interface to improve both performance and scalability. This interface allows subsets, or teams, of threads to perform a collective together. As opposed to MPI’s communicators, our interface allows set of threads to be placed in teams instantly rather than explicitly constructing communicators, thus allowing for a more dynamic team construction and manipulation. We motivate our ideas with three application kernels: Dense Matrix Multiplication, Dense Cholesky factorization and multidimensional Fourier transforms. We describe how the three aforementioned applications can be succinctly written in UPC thereby aiding productivity. We also show how such an interface allows for scalability by running on up to 16,384 processors on the Blue-Gene/L. In a few lines of UPC code, we wrote a dense matrix multiply routine achieves 28.8 TFlop/s and a 3D FFT that achieves 2.1 TFlop/s. We analyze our performance results through models and show that the machine resources rather than the interfaces themselves limit the performance.
Salt Lake City, USA
by Jennifer Harris last modified 2009-04-22 04:18
« September 2017 »
Su Mo Tu We Th Fr Sa
12
3456789
10111213141516
17181920212223
24252627282930
 

Powered by Plone

CScADS Collaborators include:

Rice University ANL UCB UTK WISC