Jakub Kurzak, Alfredo Buttari, and Jack Dongarra (2007)
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization
IEEE Transactions on Parallel and Distributed Systems.
The STI CELL processor introduces pioneering solutions in processor architecture. At the same time it presents new challenges for the development of numerical algorithms. One is effective exploitation of the differential between the speed of single and double precision arithmetic; the other is efficient parallelization between the short vector SIMD cores. In this work, the first challenge is addressed by utilizing a mixed-precision algorithm for the solution of a dense symmetric positive definite system of linear equations, which delivers double precision accuracy, while performing the bulk of the work in single precision. The second challenge is approached by introducing much finer granularity of parallelization than has been used for other architectures and using a lightweight decentralized synchronization. The implementation of the computationally intensive sections gets within 90 percent of peak floating point performance, while the implementation of the memory intensive sections reaches within 90 percent of peak memory bandwidth. On a single CELL processor, the algorithm achieves over 170 Gflop/s when solving a symmetric positive definite system of linear equation in single precision and over 150 Gflop/s when delivering the result in double precision accuracy.