Introduction
LightSpMV is a novel CUDA-compatible sparse matrix-vector multiplication (SpMv) algorithm using the standard compressed sparse row (CSR) storage format. This algiorithm is written in CUDA C++ template classes and achieves high speed by benefiting from the fine-grained dynamic distribution of matrix rows over warps/vectors based on atomic operations as well as efficient vector dot product computation. We have evaluated LightSpMV using various sparse matrices and further compared it to the CSR-based SpMV subprograms in the leading CUSP and cuSPARSE. Performance evaluation reveals that on a single Tesla K40c GPU, LightSpMV is superior to both CUSP and cuSPARSE, with a speedup of up to 2.60 and 2.63 over CUSP, and up to 1.93 and 1.79 over cuSPARSE for single-precision and double-precision floating point, respectively. In addition, for the acceleration of the PageRank graph application, LightSpMV still keeps consistent superiority to the aforementioned three counterparts.
Note: for the presen time users can refer to my example code for the graph PageRank algorithm to know how to use LightSpMV template class (see files main.cu, LigthSpMVCore.h and PageRankLightSpMV.cu).
Downloads
- Latest release (v1.0)
More details about the changes in this version are available at ChangeLog.
- Sparse matrices
The set of sparse matrices used in our publications.
- PageRank example code
I have implemented the graph PageRank algorithm using the following four SpMV implementations: LigthSpMV, CUSP, cuSparse and ViennelCL. For the present times, users can read this simple example code to know how to embed the aforementioned SpMV implementations into existing code.
Citation
- Yongchao Liu and Bertil Schmidt: "LightSpMV: faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs". 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2015), 2015, pp. 82-89
- Yongchao Liu, Jorge Gonzalez-Dominguez, Bertil Schmidt: "Faster compressed sparse row (CSR)-based sparse matrix-vector multiplication using CUDA". GPU Technology Conference (GTC 2015), 2015
- Yongchao Liu and Bertil Schmidt: "LightSpMV: faster CUDA-compatible sparse matrix-vector multiplication using compressed sparse rows". Journal of Signal Processing Systems, 2017, doi:10.1007/s11265-016-1216-4.
Other related papers
- Yongchao Liu, Tony Pan, Srinivas Aluru: "Parallel pairwise correlation computation on Intel Xeon Phi clusters". 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2016), 2016, pp. 141-149.
- Yongchao Liu and Srinivas Aluru: "LightScan: faster scan primitive on CUDA compatible manycore processors". arXiv:1604.04815, 2016.
- Yongchao Liu, Tony Pan, Oded Green and Srinivas Aluru: "Parallelized Kendall's tau coefficient computation via SIMD vectorized sorting on many-integrated-core processors". Journal of Parallel and Distributed Computing, 2017, under review [arXiv]
Parameters
Input:
- -i <string> sparse matrix A file (in Matrix Market format)
- -x <string> vector X file (one element per line) [otherwise, set each element to 1.0]
- -y <string> vector Y file (one element per line) [otherwise, set each element to 0.0]
Output:
- -o <string> output file (one element per line) [otherwise, no output]
Compute:
- -a <float> alpha value, default = 1
- -b <float> beta value, default = 1
- -f <int> formula used, default = 1
- 0: y = Ax
- 1: y = alpha * Ax + beta * y
- -r <int> select the routine to use, default = 1
- 0: vector-based row dynamic distribution
- 1: warp-based row dynamic distribution
- -d <int> double-precision floating point, default = 0
- -g <int> index of the single GPU used, default = 0
- -m <int> number of SpMV iterations, default = 1000
Installation and Usage
Prerequisites
- CUDA 6.5 toolkit
- CUDA-enabled GPUs with compute capability 3.0 or higher
Download and compiling
- Download the source code tarball
- Uncompress using the "tar -zxvf" command
- Type command "make" to compile the program
Typical Usage
LightSpMV accepts sparse matrices stored in Matrix Market file format, and performs SpmV in memory using the standard CSR format.
- ./lightspmv -i matrix.mm
- ./lightspmv -i matrix.mm -m 1 -d 1
- ./ilghtspmv -i matrix.mm -m 1 -f 0 -o out.y
Change Log
Contact
If any questions or improvements, please feel free to contact Liu, Yongchao.