Cuda函数指针 [英] Cuda function pointers

查看:300
本文介绍了Cuda函数指针的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





我试过,但没有成功 -


错误:sm_1x中不支持函数指针和函数模板参数。




  float f1(float x){
return x;
}

__global__ void tabulate(float lower,float upper,float p_function(float),float * result){
for(lower; lower< upper; lower ++){
* result = * result + p_function(lower);
}
}

int main(){
float res;
float * dev_res;

cudaMalloc((void **)& dev_res,sizeof(float));

tabulate<<< 1,1>>>(0.0,5.0,f1,dev_res);
cudaMemcpy(& res,dev_res,sizeof(float),cudaMemcpyDeviceToHost);

printf(%f \\\
,res);
/ ********************************************* *************************** /
scanf(%s);

return 0;

}


解决方案

摆脱编译错误,编译代码时,必须使用 -gencode arch = compute_20,code = sm_20 作为编译器参数。但是,您可能会遇到一些运行时问题:



取自CUDA编程指南 http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions


在主机代码中支持 __全局__ 函数的函数指针,但不支持器件代码。
__ device __ 函数的函数指针仅在为计算能力2.x和更高版本的设备编译的设备代码中受支持。



不允许在主机代码中使用 __ device __ 函数的地址。


所以你可以有这样的东西(改编自FunctionPointers示例):

 你的函数指针类型 - 返回unsigned char,取参数类型unsigned char和float 
typedef unsigned char(* pointFunction_t)(unsigned char,float);

//一些指向的设备函数
__device__ unsigned char
阈值(unsigned char in,float thresh)
{
...
}

// pComputeThreshold是指向您的__device__函数的设备端函数指针
__device__ pointFunction_t pComputeThreshold = Threshold;
//指向你的__device__函数的主机端函数指针
pointFunction_t h_pointFunction;

//在主机代码中:将函数指针复制到它们的主机等价物
cudaMemcpyFromSymbol(& h_pointFunction,pComputeThreshold,sizeof(pointFunction_t))

然后,您可以将 h_pointFunction 作为参数传递给您的内核,您的 __ device __ 函数。

 作为参数
__global__ void kernel(pointFunction_t pPointOperation)
{
unsigned char tmp;
...
tmp =(* pPointOperation)(tmp,150.0)
...
}

//在主机代码中调用内核,传入你的主机端__device__函数指针
kernel<<< ...>>>(h_pointFunction);

希望这有意义。总之,它看起来像你必须改变你的f1函数为一个 __ device __ 函数,并按照类似的过程(typedef不是必需的,但他们确实做代码nicer)得到它作为一个有效的函数指针在主机端传递给你的内核。我还建议给函数指针CUDA样本一个


I was trying to make somtehing like this (actually I need to write some integration functions) in CUDA

I tried this but it did not worked - it's only caused.

Error: Function pointers and function template parameters are not supported in sm_1x.

float f1(float x) {
    return x;
}

__global__ void tabulate(float lower, float upper, float p_function(float), float*result){
    for (lower; lower < upper; lower++) {
                *result = *result + p_function(lower);
        }
}

int main(){
        float res;
    float* dev_res;

        cudaMalloc( (void**)&dev_res, sizeof(float) ) ;

    tabulate<<<1,1>>>(0.0, 5.0, f1, dev_res);
    cudaMemcpy(&res, dev_res, sizeof(float), cudaMemcpyDeviceToHost ) ;

    printf("%f\n", res );
    /************************************************************************/
    scanf("%s");

    return 0;

}

解决方案

To get rid of your compile error, you'll have to use -gencode arch=compute_20,code=sm_20 as a compiler argument when compiling your code. But then you'll likely have some runtime problems:

Taken from the CUDA Programming Guide http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions

Function pointers to __global__ functions are supported in host code, but not in device code. Function pointers to __device__ functions are only supported in device code compiled for devices of compute capability 2.x and higher.

It is not allowed to take the address of a __device__ function in host code.

so you can have something like this (adapted from the "FunctionPointers" sample):

//your function pointer type - returns unsigned char, takes parameters of type unsigned char and float
typedef unsigned char(*pointFunction_t)(unsigned char, float);

//some device function to be pointed to
__device__ unsigned char
Threshold(unsigned char in, float thresh)
{
   ...
}

//pComputeThreshold is a device-side function pointer to your __device__ function
__device__ pointFunction_t pComputeThreshold = Threshold;
//the host-side function pointer to your __device__ function
pointFunction_t h_pointFunction;

//in host code: copy the function pointers to their host equivalent
cudaMemcpyFromSymbol(&h_pointFunction, pComputeThreshold, sizeof(pointFunction_t))

You can then pass the h_pointFunction as a parameter to your kernel, which can use it to call your __device__ function.

//your kernel taking your __device__ function pointer as a parameter
__global__ void kernel(pointFunction_t pPointOperation)
{
    unsigned char tmp;
    ...
    tmp = (*pPointOperation)(tmp, 150.0)
    ...
}

//invoke the kernel in host code, passing in your host-side __device__ function pointer
kernel<<<...>>>(h_pointFunction);

Hopefully that made some sense. In all, it looks like you would have to change your f1 function to be a __device__ function and follow a similar procedure (the typedefs aren't necessary, but they do make the code nicer) to get it as a valid function pointer on the host-side to pass to your kernel. I'd also advise giving the FunctionPointers CUDA sample a look over

这篇关于Cuda函数指针的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆