Publications
This folder holds the following references to publications, sorted by year and author.
There are 79 references in this bibliography folder.
Alvaro, W, Kurzak, J, and Dongarra, J
(2009).
Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture -- CELL Processor
Parallel Computing, 35(3):138-150.
Buttari, A, Dongarra, J, Kurzak, J, and Langou, J
(2009).
Parallel Dense Linear Algebra Software in the Multicore Era
In: Cyberinfrastructure Technologies and Applications, ed. by Junwei Cao. Nova Science Publishers, Inc., 400 Oser Ave., Suite 1600, Hauppauge NY 11788-3619 , chap. 1, pp. 9-24.
Kurzak, J and Dongarra, J
(2009).
QR factorization for the Cell Broadband Engine
Sci. Program, 17:1-2.
Nishtala, R and Yelick, KA
(2009).
Optimizing Collective Communication on Multicores
In: Hot Topics in Parallelism .
Nishtala, RN, Hargrove, PH, Bonachea, DO, and Yelick, KA
(2009).
Scaling Communication-Intensive Applications on BlueGene/P Using One-Sided Communication and Overlap
In: 23rd International Parallel & Distributed Processing Symposium.
Tallent, N and Mellor-Crummey, J
(2009).
Effective performance measurement and analysis of multithreaded applications
Proc. of Symposium on the Principles and Practice of Parallel Programming (PPoPP):229-240.
Tallent, N
(2008).
HPCToolkit: Tools for performance analysis of optimized parallel programs
Preprint.
Ahn, D, Arnold, D, de Supinski, B, and Lee, G
(2008).
Overcoming Scalability Challenges for Tool Daemon Launching
37th International Conference on Parallel Processing (ICPP-08).
Arnold, D
(2008).
Reliable, Scalable Tree-Based Overlay Networks
PhD Thesis, University of Wisconsin.
Beckman, P, Iskra, K, Yoshii, K, Coghlan, S, and Nataraj, A
(2008).
Benchmarking the Effects of Operating System Interference on Extreme-Scale Parallel Machines
Cluster Computing, 11(1):3-16.
Buttari, A, Langou, J, Kurzak, J, and Dongarra, J
(2008).
Parallel Tiled QR Factorization for Multicore Architectures
Concurrency and Computation: Practice and Experience, 20:1573-1590.
Cooper, K, Eckhardt, J, and Kennedy, K
(2008).
Redundancy Elimination Revisited
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques:12-21.
Datta, K, Kamil, S, Williams, S, Oliker, L, Shalf, J, and Yelick, K
(2008).
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
SIAM Review (SIREV).
Datta, K, Murphy, M, Volkov, V, Williams, S, Carter, J, Oliker, L, Patterson, D, Shalf, J, and Yelick, K
(2008).
Stencil Computation Optimization and Autotuning on State-of-the-Art Multicore Architectures
Supercomputing 2008 (SC08).
Dongarra, J, Pineau, J, Robert, Y, Shi Z, and Vivien, F
(2008).
Revisiting Matrix Product on Master-Worker Platforms
International Journal of Foundations of Computer Science (IJFCS), 19(6):1317-1336.
Iancu, C, Chen, W, and Yelick, K
(2008).
Performance portable optimizations for loops containing communication operations
In: International Conference on Supercomputing, pp. 266-276.
Jain, A
(2008).
pOSKI: An Extensible Autotuning Framework to Perform Optimized SpMVs on Multicore Architectures
Master's Thesis, University of California at Berkeley.
Jain, A, Kamil, S, Mohiyuddin, M, Shalf, J, and Kubiatowicz, JD
(2008).
Hybrid Electric/Photonic Networks for Scientific Applications on Tiled CMP
In: Hot Interconnects.
Kurzak, J, Buttari, A, and Dongarra, J
(2008).
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization
IEEE Transactions on Parallel and Distributed Systems, 19(9):1-11.
Kurzak, J, Buttari, A, Luszczek, P, and Dongarra, J
(2008).
The PlayStation 3 for High Performance Scientific Computing
Computing in Science and Engineering:80-83.
Mellor-Crummey, J
(2008).
Pinpointing and exploiting opportunities for enhancing data reuse.
Proc. of the 2008 IEEE Intl. Symposium on Performance Analysis of Systems and Software.
Marin, G and Mellor-Crummey, J
(2008).
. Pinpointing and exploiting opportunities for enhancing data reuse
Proc. of the 2008 IEEE Intl. Symposium on Performance Analysis of Systems and Software.
Mellor-Crummey J
(2008).
Managing Locality in Grand Challenge Applications: A Case Study of the Gyrokinetic Toroidal Code
Proc. of the SciDAC 2008 Conference, Journal of Physics, 125(012087).
Mirgorodskiy, A and Miller, B
(2008).
Diagnosing Distributed Systems with Self-Propelled Instrumentation
Lecture Notes in Computer Science , 5346.
Nataraj, A, Malony, A, Morris, A, Arnold, D, and Miller, B
(2008).
In Search of Sweet-Spots in Parallel Performance Monitoring
IEEE International Conference on Cluster Computing (Cluster 2008).