Publications
This folder holds the following references to publications, sorted by year and author.
There are 79 references in this bibliography folder.
Nishtala, R, Almasi, G, and Cascaval, C
(2008).
Performance without Pain = Productivity, Data layouts and Collectives in UPC
In: Principles and Practices of Parallel Programming (PPoPP).
Raicu, I, Zhang, Z, Wilde, M, Foster, I, Beckman, P, Iskra, K, and Clifford, B
(2008).
Toward Loosely Coupled Programming on Petascale Systems
Proceedings of the 20th ACM/IEEE Conference on Supercomputing.
Rosenblum, N, Zhu, X, Miller, B, and Hunt, K
(2008).
Learning to Analyze Binary Computer Code
In: 23rd AAAI Conference on Artificial Intelligence (AAAI 2008).
Tallent, N, Mellor-Crummey, J, Adhianto, L, Fagan, M, and Krentel, M
(2008).
HPCToolkit: performance tools for scientific computing
Proc. of the SciDAC 2008 Conference, J. Phys., 125(012088).
Williams, S, Carter, J, Oliker, L, Shalf, J, and Yelick, K
(2008).
Lattice Boltzmann Simulation Optimization on Leading Multicore Platforms
,” IEEE International Parallel and Distributed Processing Symposium (IPDPS’08).
Williams, S, Patterson, D, Oliker, L, Shalf, J, and Yelick, K
(2008).
The Roofline Model: A pedagogical tool for auto-tuning kernels on multicore architectures
In: HOT Chips, A Symposium on High Performance Chips.
Yoshii, K, Iskra, K, Broekema P, Naik, H, and Beckman, P
(2008).
Characterizing the Performance of Big Memory on Blue Gene Linux
Proceedings of the 2nd International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2).
Zhang, Z, Espinosa, A, Iskra, K, Raicu, I, Foster, I, and Wilde, M
(2008).
Design and Evaluation of a Collective I/O Model for Loosely-coupled Petascale Programming
Proceedings of the 1st Workshop on Many-Task Computing on Grids and Supercomputers.
Agarwal, S, Barik, R, Bonachea, D, Sarkar, V, Shyamasundar, R, and Yelick, K
(2007).
Deadlock-Free Scheduling of X10 Computations with Bounded Resources
In: Symposium on Parallel Algorithms and Architecture (SPAA), pp. 229–240, San Diego, California, ACM.
Arnold, DC, Ahn, DH, Supinski, BR, Lee, G, Miller, BP, and Schulz, M
(2007).
Stack Trace Analysis for Large Scale Debugging
In: Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 07), Long Beach, California, IEEE.
Bordelon, A
(2007).
Developing a Scalable, Extensible Parallel Performance Analysis Toolkit
Master thesis, Rice University, Department of Computer Science.
Budlimic, Z, Zhang, R, and Scherer, W
(2007).
Runtime Tuning of STM Validation Techniques
In: Workshop on Exploiting Parallelism with Transactional Memory.
Buttari, A, Dongarra, J, Husbands, P, Kurzak, J, and Yelick, K
(2007).
Multithreading for Synchronization Tolerance in Matrix Factorization
In: Proceedings of the SciDAC 2007 Conference, Boston, Massachusetts, Journal of Physics: Conference Series.
Buttari, A, Langou, J, Kurzak, J, and Dongarra, J
(2007).
A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures
Parallel Computing.
Buttari, A, Langou, J, Kurzak, J, and Dongarra, J
(2007).
Parallel Tiled QR Factorization for Multicore Architectures
Concurrency and Computation: Practice and Experience.
Chen, W
(2007).
Optimizing Partitioned Global Address Space Programs for Cluster Architectures
PhD thesis, University of California-Berkeley, Computer Science Division.
Chen, W, Bonachea, D, Iancu, C, and Yelick, K
(2007).
Automatic Nonblocking Communication for Partitioned Global Address Space Programs
In: Proceedings of the International Conference on Supercomputing (ICS), pp. 158–167, Seattle, Washington, ACM.
Coarfa, C, Mellor-Crummey, J, Froyd, N, and Dotsenko, Y
(2007).
Scalability Analysis of SPMD Codes Using Expectations
In: Proceedings of the International Conference on Supercomputing, pp. 13–22, Seattle, Washington, ACM.
Demmel, J, Hoemmen, M, Mohiyuddin, M, and Yelick K
(2007).
Avoiding Communication in Computing Krylov Subspaces
University of California EECS Department .
Duell, J
(2007).
Pthreads or Processes: Which is Better for Implementing Global Address Space languages?
Master's Thesis, UC Berkeley.
Husbands, P and Yelick, K
(2007).
Multithreading and One-Sided Communication in Parallel LU Factorization
In: Proceedings of Supercomputing (SC07), Reno, Nevada, ACM.
Husbands, P and Yelick, K
(2007).
Multithreading and One-Sided Communication in Parallel LU Factorization
In: Proceedings of Supercomputing (SC07).
Kamil, A and Yelick, K
(2007).
Hierarchical Pointer Analysis for Distributed Programs
In: Static Analysis Symposium (SAS), pp. 281–297, Kongens Lyngby, Denmark, Springer Berlin / Heidelberg.
Kurzak, J and Dongarra, J
(2007).
Implementation of Mixed Precision in Solving Systems of Linear Equations on the Cell Processor
Concurrency and Computation: Practice and Experience., 19(10):1371–1385.
Kurzak, J, Buttari, A, and Dongarra, J
(2007).
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization
IEEE Transactions on Parallel and Distributed Systems.