LightSpMV is a novel CUDA-compatible sparse matrix-vector multiplication (SpMv) algorithm using the standard compressed sparse row (CSR) storage format. This algiorithm is written in CUDA C++ template classes and achieves high speed by benefiting from the fine-grained dynamic distribution of matrix rows over warps/vectors based on atomic operations as well as efficient vector dot product computation. We have evaluated LightSpMV using various sparse matrices and further compared it to the CSR-based SpMV subprograms in the leading CUSP and cuSPARSE. Performance evaluation reveals that on a single Tesla K40c GPU, LightSpMV is superior to both CUSP and cuSPARSE, with a speedup of up to 2.60 and 2.63 over CUSP, and up to 1.93 and 1.79 over cuSPARSE for single-precision and double-precision floating point, respectively. In addition, for the acceleration of the PageRank graph application, LightSpMV still keeps consistent superiority to the aforementioned three counterparts.

Note: for the presen time users can refer to my example code for the graph PageRank algorithm to know how to use LightSpMV template class (see files, LigthSpMVCore.h and



Other related papers





Installation and Usage


  1. CUDA 6.5 toolkit
  2. CUDA-enabled GPUs with compute capability 3.0 or higher

Download and compiling

  1. Download the source code tarball
  2. Uncompress using the "tar -zxvf" command
  3. Type command "make" to compile the program

Typical Usage

LightSpMV accepts sparse matrices stored in Matrix Market file format, and performs SpmV in memory using the standard CSR format.

  1. ./lightspmv -i
  2. ./lightspmv -i -m 1 -d 1
  3. ./ilghtspmv -i -m 1 -f 0 -o out.y

Change Log


If any questions or improvements, please feel free to contact Liu, Yongchao.