Ankit Jain (2008)
pOSKI: An Extensible Autotuning Framework to Perform Optimized SpMVs on Multicore Architectures
Master's Thesis, University of California at Berkeley.
We have developed pOSKI: the Parallel Optimized Sparse
Kernel Interface – an autotuning framework to optimize
Sparse Matrix Vector Multiply (SpMV) performance on
emerging shared memory multicore architectures. Our
autotuning methodology extends previous work done in
the scientific computing community targeting serial architectures.
In addition to previously explored parallel optimizations,
we find that that load balanced data decomposition
is extremely important to achieving good parallel
performance on the new generation of parallel architectures.
Our best parallel configurations perform up to 9x
faster than optimized serial codes on the AMD Santa Rosa
architecture, 11.3x faster on the AMD Barcelona architecture,
and 7.2x faster on the Intel Clovertown architecture.