## Introduction

**LightSpMV** is a novel CUDA-compatible sparse matrix-vector multiplication (SpMv) algorithm using the standard compressed sparse row (CSR) storage format. This algiorithm is written in CUDA C++ template classes and achieves high speed by benefiting from the fine-grained dynamic distribution of matrix rows over warps/vectors based on atomic operations as well as efficient vector dot product computation. We have evaluated LightSpMV using various sparse matrices and further compared it to the CSR-based SpMV subprograms in the leading CUSP and cuSPARSE. Performance evaluation reveals that on a single Tesla K40c GPU, LightSpMV is superior to both CUSP and cuSPARSE, with a speedup of up to 2.60 and 2.63 over CUSP, and up to 1.93 and 1.79 over cuSPARSE for single-precision and double-precision floating point, respectively. In addition, for the acceleration of the PageRank graph application, LightSpMV still keeps consistent superiority to the aforementioned three counterparts.

*Note: for the presen time users can refer to my example code for the graph PageRank algorithm to know how to use LightSpMV template class (see files main.cu, LigthSpMVCore.h and PageRankLightSpMV.cu). *

## Downloads

- Latest release (v1.0)
More details about the changes in this version are available at ChangeLog.

- Sparse matrices
The set of sparse matrices used in our publications.

- PageRank example code
I have implemented the graph PageRank algorithm using the following four SpMV implementations: LigthSpMV, CUSP, cuSparse and ViennelCL. For the present times, users can read this simple example code to know how to embed the aforementioned SpMV implementations into existing code.

## Citation

- Yongchao Liu and Bertil Schmidt: "LightSpMV: faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs". 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (
**ASAP 2015**), 2015, pp. 82-89 - Yongchao Liu, Jorge Gonzalez-Dominguez, Bertil Schmidt: "Faster compressed sparse row (CSR)-based sparse matrix-vector multiplication using CUDA". GPU Technology Conference (
**GTC 2015**), 2015 - Yongchao Liu and Bertil Schmidt: "LightSpMV: faster CUDA-compatible sparse matrix-vector multiplication using compressed sparse rows".
**Journal of Signal Processing Systems**, 2017, doi:10.1007/s11265-016-1216-4.

### Other related papers

- Yongchao Liu, Tony Pan, Srinivas Aluru: "Parallel pairwise correlation computation on Intel Xeon Phi clusters". 28th International Symposium on Computer Architecture and High Performance Computing (
**SBAC-PAD 2016**), 2016, pp. 141-149. - Yongchao Liu and Srinivas Aluru: "LightScan: faster scan primitive on CUDA compatible manycore processors".
**arXiv:1604.04815**, 2016. - Yongchao Liu, Tony Pan, Oded Green and Srinivas Aluru: "Parallelized Kendall's tau coefficient computation via SIMD vectorized sorting on many-integrated-core processors".
**Journal of Parallel and Distributed Computing**, 2017, under review [arXiv]

## Parameters

### Input:

- -i <string> sparse matrix A file (in Matrix Market format)
- -x <string> vector X file (one element per line) [otherwise, set each element to 1.0]
- -y <string> vector Y file (one element per line) [otherwise, set each element to 0.0]

### Output:

- -o <string> output file (one element per line) [otherwise, no output]

### Compute:

- -a <float> alpha value, default = 1
- -b <float> beta value, default = 1
- -f <int> formula used, default = 1
- 0: y = Ax
- 1: y = alpha * Ax + beta * y

- -r <int> select the routine to use, default = 1
- 0: vector-based row dynamic distribution
- 1: warp-based row dynamic distribution

- -d <int> double-precision floating point, default = 0
- -g <int> index of the single GPU used, default = 0
- -m <int> number of SpMV iterations, default = 1000

## Installation and Usage

### Prerequisites

- CUDA 6.5 toolkit
- CUDA-enabled GPUs with compute capability 3.0 or higher

### Download and compiling

- Download the source code tarball
- Uncompress using the "tar -zxvf" command
- Type command "make" to compile the program

### Typical Usage

LightSpMV accepts sparse matrices stored in Matrix Market file format, and performs SpmV in memory using the standard CSR format.

- ./lightspmv -i matrix.mm
- ./lightspmv -i matrix.mm -m 1 -d 1
- ./ilghtspmv -i matrix.mm -m 1 -f 0 -o out.y

## Change Log

## Contact

If any questions or improvements, please feel free to contact Liu, Yongchao.