Cuda函数指针 [英] Cuda function pointers

查看：300 发布时间：2017/3/4 11:43:37 c cuda function-pointers

本文介绍了Cuda函数指针的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试过，但没有成功 -

错误：sm_1x中不支持函数指针和函数模板参数。

  float f1（float x）{
 return x; 
} 
 
 __global__ void tabulate（float lower，float upper，float p_function（float），float * result）{
 for（lower; lower< upper; lower ++）{ 
 * result = * result + p_function（lower）; 
} 
} 
 
 int main（）{
 float res; 
 float * dev_res; 
 
 cudaMalloc（（void **）& dev_res，sizeof（float））; 
 
 tabulate<<< 1,1>>>（0.0，5.0，f1，dev_res）; 
 cudaMemcpy（& res，dev_res，sizeof（float），cudaMemcpyDeviceToHost）; 
 
 printf（％f \\\
，res）; 
 / ********************************************* *************************** / 
 scanf（％s）; 
 
 return 0; 
 
}

解决方案

摆脱编译错误，编译代码时，必须使用 -gencode arch = compute_20，code = sm_20 作为编译器参数。但是，您可能会遇到一些运行时问题：

取自CUDA编程指南 http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions

在主机代码中支持 __全局__ 函数的函数指针，但不支持器件代码。
__ device __ 函数的函数指针仅在为计算能力2.x和更高版本的设备编译的设备代码中受支持。

不允许在主机代码中使用 __ device __ 函数的地址。

所以你可以有这样的东西（改编自FunctionPointers示例）：

 你的函数指针类型 - 返回unsigned char，取参数类型unsigned char和float 
 typedef unsigned char（* pointFunction_t）（unsigned char，float）; 
 
 //一些指向的设备函数
 __device__ unsigned char 
阈值（unsigned char in，float thresh）
 {
 ... 
} 
 
 // pComputeThreshold是指向您的__device__函数的设备端函数指针
 __device__ pointFunction_t pComputeThreshold = Threshold; 
 //指向你的__device__函数的主机端函数指针
 pointFunction_t h_pointFunction; 
 
 //在主机代码中：将函数指针复制到它们的主机等价物
 cudaMemcpyFromSymbol（& h_pointFunction，pComputeThreshold，sizeof（pointFunction_t））

然后，您可以将 h_pointFunction 作为参数传递给您的内核，您的 __ device __ 函数。

 作为参数
 __global__ void kernel（pointFunction_t pPointOperation）
 {
 unsigned char tmp; 
 ... 
 tmp =（* pPointOperation）（tmp，150.0）
 ... 
} 
 
 //在主机代码中调用内核，传入你的主机端__device__函数指针
 kernel<<< ...>>>（h_pointFunction）;

希望这有意义。总之，它看起来像你必须改变你的f1函数为一个 __ device __ 函数，并按照类似的过程（typedef不是必需的，但他们确实做代码nicer）得到它作为一个有效的函数指针在主机端传递给你的内核。我还建议给函数指针CUDA样本一个

I was trying to make somtehing like this (actually I need to write some integration functions) in CUDA

I tried this but it did not worked - it's only caused.

Error: Function pointers and function template parameters are not supported in sm_1x.

float f1(float x) {
    return x;
}

__global__ void tabulate(float lower, float upper, float p_function(float), float*result){
    for (lower; lower < upper; lower++) {
                *result = *result + p_function(lower);
        }
}

int main(){
        float res;
    float* dev_res;

        cudaMalloc( (void**)&dev_res, sizeof(float) ) ;

    tabulate<<<1,1>>>(0.0, 5.0, f1, dev_res);
    cudaMemcpy(&res, dev_res, sizeof(float), cudaMemcpyDeviceToHost ) ;

    printf("%f\n", res );
    /************************************************************************/
    scanf("%s");

    return 0;

}

解决方案

To get rid of your compile error, you'll have to use -gencode arch=compute_20,code=sm_20 as a compiler argument when compiling your code. But then you'll likely have some runtime problems:

Taken from the CUDA Programming Guide http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions

Function pointers to __global__ functions are supported in host code, but not in device code. Function pointers to __device__ functions are only supported in device code compiled for devices of compute capability 2.x and higher.

It is not allowed to take the address of a __device__ function in host code.

so you can have something like this (adapted from the "FunctionPointers" sample):

//your function pointer type - returns unsigned char, takes parameters of type unsigned char and float
typedef unsigned char(*pointFunction_t)(unsigned char, float);

//some device function to be pointed to
__device__ unsigned char
Threshold(unsigned char in, float thresh)
{
   ...
}

//pComputeThreshold is a device-side function pointer to your __device__ function
__device__ pointFunction_t pComputeThreshold = Threshold;
//the host-side function pointer to your __device__ function
pointFunction_t h_pointFunction;

//in host code: copy the function pointers to their host equivalent
cudaMemcpyFromSymbol(&h_pointFunction, pComputeThreshold, sizeof(pointFunction_t))

You can then pass the h_pointFunction as a parameter to your kernel, which can use it to call your __device__ function.

//your kernel taking your __device__ function pointer as a parameter
__global__ void kernel(pointFunction_t pPointOperation)
{
    unsigned char tmp;
    ...
    tmp = (*pPointOperation)(tmp, 150.0)
    ...
}

//invoke the kernel in host code, passing in your host-side __device__ function pointer
kernel<<<...>>>(h_pointFunction);

Hopefully that made some sense. In all, it looks like you would have to change your f1 function to be a __device__ function and follow a similar procedure (the typedefs aren't necessary, but they do make the code nicer) to get it as a valid function pointer on the host-side to pass to your kernel. I'd also advise giving the FunctionPointers CUDA sample a look over

这篇关于Cuda函数指针的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Cuda函数指针 [英] Cuda function pointers

问题描述

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

Cuda函数指针 [英] Cuda function pointers

问题描述

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭