该CUBLAS函数调用cublasSgemv [英] The cublas function call cublasSgemv

查看:2310
本文介绍了该CUBLAS函数调用cublasSgemv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

感谢为@hubs,当调用cublasSgemv应该注意到CUBLAS_OP_T也转矢量。
/ *我学习CUDA和CUBLAS了一个月,我想测试CUBLAS进一步使用的性能。但是,在使用cublasSgemv我的矩阵向量相乘,得到的答案是错的。
  我初始化矩阵A和向量行主x。我送他们到设备使用cud​​aMemcpy,并调用该函数cublasSgemv,因为A是行重大,我使用参数CUBLAS_OP_T它调换。* /

Thank for @hubs , when call cublasSgemv should notice that CUBLAS_OP_T is also transpose vector. /*I am learning cuda and cublas for a month, and I want to test the performance of cublas for further use. But in my matrix-vector multiplication using cublasSgemv , the answer is wrong. I initialize Matrix A and Vector x in row-major. I sent them to device using cudaMemcpy, and call the function cublasSgemv , because the A is row-major, I transpose it using a parameter CUBLAS_OP_T.*/

 //the row is 50,and col is 10, A[i]=i;x[i]=1; And A matrix is row major.
 //the answer I get is 45,545,.....4545,0,0,0,0,0,0,0,0,........0

int main(){
int row=50;
int col=10;
int N=row*col;
float*A=new float[N];
float* y_gpu=new float[50]; 
for (int i=0;i<N;i++)
{
    A[i]=(float)i;
}
float* x=new float[10];
for (int i=0;i<10;i++)
{
    x[i]=1;
}
GpuVec(A,x,y_gpu,row,col);  //call the function 
    for(int i=0;i<50;i++){
    cout<<" "<<y_gpu[i]<<endl;  //
} 

return 0;

}

int GpuVec(const float* A,const float* x, float* y,const int row,const int col){
cudaError_t cudastat;
cublasStatus_t stat;
int size=row*col;
cublasHandle_t handle;
float* d_A;  //device matrix
float* d_x;  //device vector
float* d_y;  //device result
cudastat=cudaMalloc((void**)&d_A,size*sizeof(float)); 
cudastat=cudaMalloc((void**)&d_x,col*sizeof(float));
cudastat=cudaMalloc((void**)&d_y,row*sizeof(float));// when I copy y to d_y ,can I cout d_y?

cudaMemcpy(d_A,A,sizeof(float)*size,cudaMemcpyHostToDevice);  //copy A to device d_A
cudaMemcpy(d_x,x,sizeof(float)*col,cudaMemcpyHostToDevice);   //copy x to device d_x
float alf=1.0;
float beta=0;
    stat=cublasCreate(&handle);
stat=cublasSgemv(handle,CUBLAS_OP_T,col,row,&alf,d_A,col,d_x,1,&beta,d_y,1);//swap col and row
cudaMemcpy(y,d_y,sizeof(float)*row,cudaMemcpyDeviceToHost); // copy device result to host 
cudaFree(d_A);
cudaFree(d_x);
cudaFree(d_y);
cublasDestroy(handle);
return 0;

}

推荐答案

要使用存储在行主顺序的CUBLAS(与列主顺序作品)二维数组,你可以调用 gemv 以这种方式。

To use two-dimensional arrays stored in row-major order in cublas (that works with column-major order) you can call the gemv in this way.

stat = cublasSgemv(handle, CUBLAS_OP_T, col, row, &alf, d_A, col, d_x, 1, &beta, d_y, 1);

您必须交换M(行),并在通话N(列),也执行 Y = A * X ,但它可以让你使用CUBLAS调用而不调换原始数组。

You have to swap m (rows) and n (columns) in the call, too, to perform y = A * x, but it allows you to use the cublas call without transposing the original array.

这篇关于该CUBLAS函数调用cublasSgemv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆