LightSpMV is a novel CUDA-compatible sparse matrix-vector multiplication (SpMv) algorithm using the standard compressed sparse row (CSR) storage format. This algiorithm is written in CUDA C++ template classes and achieves high speed by benefiting from the fine-grained dynamic distribution of matrix rows over warps/vectors based on atomic operations as well as efficient vector dot product computation. We have evaluated LightSpMV using various sparse matrices and further compared it to the CSR-based SpMV subprograms in the leading CUSP and cuSPARSE. Performance evaluation reveals that on a single Tesla K40c GPU, LightSpMV is superior to both CUSP and cuSPARSE, with a speedup of up to 2.60 and 2.63 over CUSP, and up to 1.93 and 1.79 over cuSPARSE for single-precision and double-precision floating point, respectively. In addition, for the acceleration of the PageRank graph application, LightSpMV still keeps consistent superiority to the aforementioned three counterparts.
Note: for the presen time users can refer to my example code for the graph PageRank algorithm to know how to use LightSpMV template class (see files main.cu, LigthSpMVCore.h and PageRankLightSpMV.cu).
- Latest release (v1.0)
More details about the changes in this version are available at ChangeLog.
- Sparse matrices
The set of sparse matrices used in our publications.
- PageRank example code
I have implemented the graph PageRank algorithm using the following four SpMV implementations: LigthSpMV, CUSP, cuSparse and ViennelCL. For the present times, users can read this simple example code to know how to embed the aforementioned SpMV implementations into existing code.
- Yongchao Liu and Bertil Schmidt: "LightSpMV: faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs". 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2015), 2015, pp. 82-89
- Yongchao Liu, Jorge Gonzalez-Dominguez, Bertil Schmidt: "Faster compressed sparse row (CSR)-based sparse matrix-vector multiplication using CUDA". GPU Technology Conference (GTC 2015), 2015
- Yongchao Liu and Bertil Schmidt: "LightSpMV: faster CUDA-compatible sparse matrix-vector multiplication using compressed sparse rows". Journal of Signal Processing Systems, 2017, doi:10.1007/s11265-016-1216-4.
Other related papers
- Yongchao Liu, Tony Pan, Srinivas Aluru: "Parallel pairwise correlation computation on Intel Xeon Phi clusters". 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2016), 2016, pp. 141-149.
- Yongchao Liu and Srinivas Aluru: "LightScan: faster scan primitive on CUDA compatible manycore processors". arXiv:1604.04815, 2016.
- Yongchao Liu, Tony Pan, Oded Green and Srinivas Aluru: "Parallelized Kendall's tau coefficient computation via SIMD vectorized sorting on many-integrated-core processors". Journal of Parallel and Distributed Computing, 2017, under review [arXiv]
- -i <string> sparse matrix A file (in Matrix Market format)
- -x <string> vector X file (one element per line) [otherwise, set each element to 1.0]
- -y <string> vector Y file (one element per line) [otherwise, set each element to 0.0]
- -o <string> output file (one element per line) [otherwise, no output]
- -a <float> alpha value, default = 1
- -b <float> beta value, default = 1
- -f <int> formula used, default = 1
- 0: y = Ax
- 1: y = alpha * Ax + beta * y
- -r <int> select the routine to use, default = 1
- 0: vector-based row dynamic distribution
- 1: warp-based row dynamic distribution
- -d <int> double-precision floating point, default = 0
- -g <int> index of the single GPU used, default = 0
- -m <int> number of SpMV iterations, default = 1000
- CUDA 6.5 toolkit
- CUDA-enabled GPUs with compute capability 3.0 or higher
Download and compiling
- Download the source code tarball
- Uncompress using the "tar -zxvf" command
- Type command "make" to compile the program
LightSpMV accepts sparse matrices stored in Matrix Market file format, and performs SpmV in memory using the standard CSR format.
- ./lightspmv -i matrix.mm
- ./lightspmv -i matrix.mm -m 1 -d 1
- ./ilghtspmv -i matrix.mm -m 1 -f 0 -o out.y
If any questions or improvements, please feel free to contact Liu, Yongchao.