Skip to content. | Skip to navigation
Alvaro, W, Kurzak, J, and Dongarra, J (2009). Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture -- CELL Processor Parallel Computing, 35(3):138-150.
Buttari, A, Dongarra, J, Kurzak, J, and Langou, J (2009). Parallel Dense Linear Algebra Software in the Multicore Era In: Cyberinfrastructure Technologies and Applications, ed. by Junwei Cao. Nova Science Publishers, Inc., 400 Oser Ave., Suite 1600, Hauppauge NY 11788-3619 , chap. 1, pp. 9-24.
Kurzak, J and Dongarra, J (2009). QR factorization for the Cell Broadband Engine Sci. Program, 17:1-2.
Nishtala, R and Yelick, KA (2009). Optimizing Collective Communication on Multicores In: Hot Topics in Parallelism .
Nishtala, RN, Hargrove, PH, Bonachea, DO, and Yelick, KA (2009). Scaling Communication-Intensive Applications on BlueGene/P Using One-Sided Communication and Overlap In: 23rd International Parallel & Distributed Processing Symposium.
Tallent, N and Mellor-Crummey, J (2009). Effective performance measurement and analysis of multithreaded applications Proc. of Symposium on the Principles and Practice of Parallel Programming (PPoPP):229-240.
Adhianto, L, Banerjee, S, Fagan, M, Krentel, M, Marin, G, Mellor-Crummey, J, and Tallent, N (2008). HPCToolkit: Tools for performance analysis of optimized parallel programs Preprint.
Ahn, D, Arnold, D, de Supinski, B, and Lee, G (2008). Overcoming Scalability Challenges for Tool Daemon Launching 37th International Conference on Parallel Processing (ICPP-08).
Arnold, D (2008). Reliable, Scalable Tree-Based Overlay Networks PhD Thesis, University of Wisconsin.
Beckman, P, Iskra, K, Yoshii, K, Coghlan, S, and Nataraj, A (2008). Benchmarking the Effects of Operating System Interference on Extreme-Scale Parallel Machines Cluster Computing, 11(1):3-16.
Buttari, A, Langou, J, Kurzak, J, and Dongarra, J (2008). Parallel Tiled QR Factorization for Multicore Architectures Concurrency and Computation: Practice and Experience, 20:1573-1590.
Cooper, K, Eckhardt, J, and Kennedy, K (2008). Redundancy Elimination Revisited Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques:12-21.
Datta, K, Kamil, S, Williams, S, Oliker, L, Shalf, J, and Yelick, K (2008). Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors SIAM Review (SIREV).
Datta, K, Murphy, M, Volkov, V, Williams, S, Carter, J, Oliker, L, Patterson, D, Shalf, J, and Yelick, K (2008). Stencil Computation Optimization and Autotuning on State-of-the-Art Multicore Architectures Supercomputing 2008 (SC08).
Dongarra, J, Pineau, J, Robert, Y, Shi Z, and Vivien, F (2008). Revisiting Matrix Product on Master-Worker Platforms International Journal of Foundations of Computer Science (IJFCS), 19(6):1317-1336.
Iancu, C, Chen, W, and Yelick, K (2008). Performance portable optimizations for loops containing communication operations In: International Conference on Supercomputing, pp. 266-276.
Jain, A (2008). pOSKI: An Extensible Autotuning Framework to Perform Optimized SpMVs on Multicore Architectures Master's Thesis, University of California at Berkeley.
Jain, A, Kamil, S, Mohiyuddin, M, Shalf, J, and Kubiatowicz, JD (2008). Hybrid Electric/Photonic Networks for Scientific Applications on Tiled CMP In: Hot Interconnects.
Kurzak, J, Buttari, A, and Dongarra, J (2008). Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization IEEE Transactions on Parallel and Distributed Systems, 19(9):1-11.
Kurzak, J, Buttari, A, Luszczek, P, and Dongarra, J (2008). The PlayStation 3 for High Performance Scientific Computing Computing in Science and Engineering:80-83.
Marin, G and Mellor-Crummey, J (2008). Pinpointing and exploiting opportunities for enhancing data reuse. Proc. of the 2008 IEEE Intl. Symposium on Performance Analysis of Systems and Software.
Marin, G and Mellor-Crummey, J (2008). . Pinpointing and exploiting opportunities for enhancing data reuse Proc. of the 2008 IEEE Intl. Symposium on Performance Analysis of Systems and Software.
Marin, G, Jin, G, and Mellor-Crummey J (2008). Managing Locality in Grand Challenge Applications: A Case Study of the Gyrokinetic Toroidal Code Proc. of the SciDAC 2008 Conference, Journal of Physics, 125(012087).
Mirgorodskiy, A and Miller, B (2008). Diagnosing Distributed Systems with Self-Propelled Instrumentation Lecture Notes in Computer Science , 5346.
Nataraj, A, Malony, A, Morris, A, Arnold, D, and Miller, B (2008). In Search of Sweet-Spots in Parallel Performance Monitoring IEEE International Conference on Cluster Computing (Cluster 2008).
CScADS Collaborators include: