LightSpMV - Faster GPU-based Sparse Matrix-Vector Multiplication

Introduction

LightSpMV is a novel CUDA-compatible sparse matrix-vector multiplication (SpMv) algorithm using the standard compressed sparse row (CSR) storage format. This algiorithm is written in CUDA C++ template classes and achieves high speed by benefiting from the fine-grained dynamic distribution of matrix rows over warps/vectors based on atomic operations as well as efficient vector dot product computation. We have evaluated LightSpMV using various sparse matrices and further compared it to the CSR-based SpMV subprograms in the leading CUSP and cuSPARSE. Performance evaluation reveals that on a single Tesla K40c GPU, LightSpMV is superior to both CUSP and cuSPARSE, with a speedup of up to 2.60 and 2.63 over CUSP, and up to 1.93 and 1.79 over cuSPARSE for single-precision and double-precision floating point, respectively. In addition, for the acceleration of the PageRank graph application, LightSpMV still keeps consistent superiority to the aforementioned three counterparts.

Note: for the presen time users can refer to my example code for the graph PageRank algorithm to know how to use LightSpMV template class (see files main.cu, LigthSpMVCore.h and PageRankLightSpMV.cu).

Downloads

Latest release (v1.0)
More details about the changes in this version are available at ChangeLog.
Sparse matrices
The set of sparse matrices used in our publications.
PageRank example code
I have implemented the graph PageRank algorithm using the following four SpMV implementations: LigthSpMV, CUSP, cuSparse and ViennelCL. For the present times, users can read this simple example code to know how to embed the aforementioned SpMV implementations into existing code.

Citation

Yongchao Liu and Bertil Schmidt: "LightSpMV: faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs". 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2015), 2015, pp. 82-89
Yongchao Liu, Jorge Gonzalez-Dominguez, Bertil Schmidt: "Faster compressed sparse row (CSR)-based sparse matrix-vector multiplication using CUDA". GPU Technology Conference (GTC 2015), 2015
Yongchao Liu and Bertil Schmidt: "LightSpMV: faster CUDA-compatible sparse matrix-vector multiplication using compressed sparse rows". Journal of Signal Processing Systems, 2017, doi:10.1007/s11265-016-1216-4.

Other related papers

Yongchao Liu, Tony Pan, Srinivas Aluru: "Parallel pairwise correlation computation on Intel Xeon Phi clusters". 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2016), 2016, pp. 141-149.
Yongchao Liu and Srinivas Aluru: "LightScan: faster scan primitive on CUDA compatible manycore processors". arXiv:1604.04815, 2016.
Yongchao Liu, Tony Pan, Oded Green and Srinivas Aluru: "Parallelized Kendall's tau coefficient computation via SIMD vectorized sorting on many-integrated-core processors". Journal of Parallel and Distributed Computing, 2017, under review [arXiv]

Parameters

Input:

-i <string> sparse matrix A file (in Matrix Market format)
-x <string> vector X file (one element per line) [otherwise, set each element to 1.0]
-y <string> vector Y file (one element per line) [otherwise, set each element to 0.0]

Output:

-o <string> output file (one element per line) [otherwise, no output]

Compute:

-a <float> alpha value, default = 1
-b <float> beta value, default = 1
-f <int> formula used, default = 1
- 0: y = Ax
- 1: y = alpha * Ax + beta * y
-r <int> select the routine to use, default = 1
- 0: vector-based row dynamic distribution
- 1: warp-based row dynamic distribution
-d <int> double-precision floating point, default = 0
-g <int> index of the single GPU used, default = 0
-m <int> number of SpMV iterations, default = 1000

Installation and Usage

Prerequisites

CUDA 6.5 toolkit
CUDA-enabled GPUs with compute capability 3.0 or higher

Download and compiling

Download the source code tarball
Uncompress using the "tar -zxvf" command
Type command "make" to compile the program

Typical Usage

LightSpMV accepts sparse matrices stored in Matrix Market file format, and performs SpmV in memory using the standard CSR format.

./lightspmv -i matrix.mm
./lightspmv -i matrix.mm -m 1 -d 1
./ilghtspmv -i matrix.mm -m 1 -f 0 -o out.y

Change Log

Contact

If any questions or improvements, please feel free to contact Liu, Yongchao.

LightSpMV - Faster GPU-based Sparse Matrix-Vector Multiplication

Site Map

Project Links

List of My Software

Big Data

Machine Learning

Scientific Computing

Sequence Alignment

Motif Discovery

NGS Read Alignment

NGS Read Error Correction

NGS de novo Assembly

SNV calling

NGS Metagenomics

Inspire Innovation