稀疏矩阵的Cblas gemm性能 [英] Cblas gemm performance for sparse matrices

查看：323 发布时间：2019/6/11 8:37:21 C++ CUDA

本文介绍了稀疏矩阵的Cblas gemm性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

与密集矩阵的相同cblas_sgemm调用相比，cblas_sgemm调用背后的原因可能是因为使用大量零的矩阵所花费的时间少得多？

我知道gemv是为矩阵向量乘法而设计的，但是为什么我不能使用gemm进行向量矩阵乘法，如果它花费的时间更少，特别是对于稀疏矩阵

下面给出了简短的代表性代码。它要求输入一个值，然后使用该值填充向量。然后它用其索引替换每个第32个值。因此，如果我们输入'0'然后我们得到一个稀疏向量，但对于其他值，我们得到一个密集向量。

What could be the reason behind a cblas_sgemm call taking much less time for matrices with a large number of zeros as compared to the same cblas_sgemm call for dense matrices?

I know gemv is designed for matrix-vector multiplication but why can't I use gemm for vector-matrix multiplication if it takes less time, especially for sparse matrices

A short representative code is given below. It asks to enter a value, and then populates a vector with that value. It then replaces every 32nd value with its index. So, if we enter '0' then we get a sparse vector but for other values we get a dense vector.

#include <iostream>
#include <stdio.h>
#include <time.h>
#include <cblas.h>
#include <cublas_v2.h>
using namespace std;

int main()
{
const int m = 5000;

timespec blas_start, blas_end;
long totalnsec; //total nano sec
double totalsec, totaltime;
int i, j;
float *A = new float[m]; // 1 x m
float *B = new float[m*m]; // m x m
float *C = new float[m]; // 1 x m

float input;
cout << "Enter a value to populate the vector (0 for sparse) ";
cin >> input; // enter 0 for sparse

// input martix A: every 32nd element is non-zero, rest of the values = input
for(i = 0; i < m; i++)
{
A[i] = input;
if( i % 32 == 0)    //adjust for sparsity
        A[i] = i;
}

// input matrix B: identity matrix
for(i = 0; i < m; i++)
        for(j = 0; j < m; j++)
            B[i*m + j] = (i==j);

clock_gettime(CLOCK_REALTIME, &blas_start);
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 1, m, m, 1.0f, A, m, B, m, 0.0f, C, m);
//cblas_sgemv(CblasRowMajor, CblasNoTrans, m, m, 1.0f, B, m, A, 1, 0.0f, C, 1);
clock_gettime(CLOCK_REALTIME, &blas_end);

/* for(i = 0; i < m; i++)
        printf("%f ", C[i]);
printf("\n\n");    */

// Print time
totalsec = (double)blas_end.tv_sec - (double)blas_start.tv_sec;
totalnsec = blas_end.tv_nsec - blas_start.tv_nsec;
if(totalnsec < 0)
{
    totalnsec += 1e9;
    totalsec -= 1;
}
totaltime = totalsec + (double)totalnsec*1e-9;
cout<<"Duration = "<< totaltime << "\n";

return 0;
}

当我在Ubuntu 14.04中运行此代码时，我得到以下结果

When I run this code in Ubuntu 14.04, I get the following results

erisp@ubuntu:~/uas/stackoverflow$ g++ gemmcomp.cpp -o gemmcomp.o -lblas
erisp@ubuntu:~/uas/stackoverflow$ ./gemmcomp.o
Enter a value to populate the vector (0 for sparse) 5
Duration = 0.0291558
erisp@ubuntu:~/uas/stackoverflow$ ./gemmcomp.o
Enter a value to populate the vector (0 for sparse) 0
Duration = 0.000959521

显示对稀疏矩阵的cblas_sgemm调用比对密集矩阵的相同调用更有效。可能是什么原因？

我的尝试：

我已经测试了输出，它是正确的

showing that cblas_sgemm call for sparse matrices is much much more efficient than the same call for dense matrices. What could be the reason?

What I have tried:

I have already tested the output and it is correct

稀疏矩阵的Cblas gemm性能 [英] Cblas gemm performance for sparse matrices

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

稀疏矩阵的Cblas gemm性能 [英] Cblas gemm performance for sparse matrices

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭