可以从全局或设备函数调用CUDA CUBLAS函数 [英] Is it possible to call a CUDA CUBLAS function from a global or device function
问题描述
我尝试并行化现有的应用程序,我已将大部分应用程序并行化并在GPU上运行,我在将一个函数迁移到GPU时出现问题
I'm trying to parallelize an existing application, I have most of the application parallelized and running on the GPU, I'm having issues migrating one function to the GPU
函数使用dtrsv函数,这是blas库的一部分,见下文。
The function uses a function dtrsv which part of the blas library,see below.
void dtrsv_call_N(double* B, double* A, int* n, int* lda, int* incx) {
F77_CALL(dtrsv)("L","T","N", n, B, lda, A, incx);
}
我已经能够调用等同的cuda / cublas函数,并且产生的结果等同于fortran dtrsv子例程。
I've been able to call the equivalent cuda/cublas function as per below,and the results produced are equivalent to the fortran dtrsv sub routine.
status = cublasDtrsv(handle,CUBLAS_FILL_MODE_LOWER,CUBLAS_OP_T,CUBLAS_DIAG_NON_UNIT, x, dev_m1, x, dev_m2, c);
if (status != CUBLAS_STATUS_SUCCESS) {
printf ( "!!!! kernel execution error.\n");
return EXIT_FAILURE;
}
我的问题是,我需要能够从设备调用cublasDtrsv,全局函数,如下所示:
My problem is that I need to be able to call cublasDtrsv from a device or global function, like below,
__global__ void Dtrsv__cm2(cublasHandle_t handle,cublasFillMode_t uplo,cublasOperation_t trans, cublasDiagType_t diag,int n, const double *A, int lda, double *x, int incx){
cublasDtrsv(handle,uplo,trans,diag, n, A, lda, x, incx);
}
在cuda 4.0中如果我尝试编译下面的错误,有没有人知道是否可以通过 __ device __
或 __ global __
函数调用cublas函数? p>
In cuda 4.0 if I try to compile the below I get the below error, does anyone know if there is a means by which cublas functions can be called from a __device__
or __global__
function?
错误:从<$ c $调用主机
函数(cublasDtrsv_v2)
c> __ device __ /__全局__
函数(Dtrsv__dev)
>
error: calling a host
function("cublasDtrsv_v2")
from a__device__
/__global__
function("Dtrsv__dev")
is not allowed
推荐答案
CUDA Toolkit 5.0引入了一个设备链接器,可以链接单独编译的设备对象文件。我相信,CUUB Toolkit 5.0的CUBLAS函数现在可以从设备函数调用(但我只审查了标题,我没有使用CUBLAS的经验)。
CUDA Toolkit 5.0 introduced a device linker that can link device object files compiled separately. I believe, CUBLAS functions from CUDA Toolkit 5.0 can now be called from device functions (but I only reviewed the headers, I have no experience using CUBLAS).
这篇关于可以从全局或设备函数调用CUDA CUBLAS函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!