CUDA 函数指针 [英] CUDA function pointers
问题描述
我试图在CUDA中做这样的事情(实际上我需要编写一些集成函数)
I was trying to make somtehing like this (actually I need to write some integration functions) in CUDA
#include <iostream>
using namespace std;
float f1(float x) {
return x * x;
}
float f2(float x) {
return x;
}
void tabulate(float p_f(float)) {
for (int i = 0; i != 10; ++i) {
std::cout << p_f(i) << ' ';
}
std::cout << std::endl;
}
int main() {
tabulate(f1);
tabulate(f2);
return 0;
}
输出:
0 1 4 9 16 25 36 49 64 81
0 1 2 3 4 5 6 7 8 9
0 1 4 9 16 25 36 49 64 81
0 1 2 3 4 5 6 7 8 9
<小时>
我尝试了以下但只得到错误
I tried the following but only got the error
错误:sm_1x 不支持函数指针和函数模板参数.
Error: Function pointers and function template parameters are not supported in sm_1x.
float f1(float x) {
return x;
}
__global__ void tabulate(float lower, float upper, float p_function(float), float* result) {
for (lower; lower < upper; lower++) {
*result = *result + p_function(lower);
}
}
int main() {
float res;
float* dev_res;
cudaMalloc( (void**)&dev_res, sizeof(float) ) ;
tabulate<<<1,1>>>(0.0, 5.0, f1, dev_res);
cudaMemcpy(&res, dev_res, sizeof(float), cudaMemcpyDeviceToHost);
printf("%f
", res);
/************************************************************************/
scanf("%s");
return 0;
}
推荐答案
要摆脱编译错误,你必须使用 -gencode arch=compute_20,code=sm_20
作为编译器编译代码时的参数.但是你可能会遇到一些运行时问题:
To get rid of your compile error, you'll have to use -gencode arch=compute_20,code=sm_20
as a compiler argument when compiling your code. But then you'll likely have some runtime problems:
取自 CUDA 编程指南 http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions
Taken from the CUDA Programming Guide http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#functions
在主机代码中支持指向 __global__
函数的函数指针,但在设备代码中不支持.__device__
函数的函数指针仅在为计算能力 2.x 及更高版本的设备编译的设备代码中受支持.
Function pointers to
__global__
functions are supported in host code, but not in device code. Function pointers to__device__
functions are only supported in device code compiled for devices of compute capability 2.x and higher.
主机代码中不允许取__device__
函数的地址.
It is not allowed to take the address of a __device__
function in host code.
所以你可以有这样的东西(改编自FunctionPointers"示例):
so you can have something like this (adapted from the "FunctionPointers" sample):
//your function pointer type - returns unsigned char, takes parameters of type unsigned char and float
typedef unsigned char(*pointFunction_t)(unsigned char, float);
//some device function to be pointed to
__device__ unsigned char
Threshold(unsigned char in, float thresh)
{
...
}
//pComputeThreshold is a device-side function pointer to your __device__ function
__device__ pointFunction_t pComputeThreshold = Threshold;
//the host-side function pointer to your __device__ function
pointFunction_t h_pointFunction;
//in host code: copy the function pointers to their host equivalent
cudaMemcpyFromSymbol(&h_pointFunction, pComputeThreshold, sizeof(pointFunction_t))
然后您可以将 h_pointFunction
作为参数传递给您的内核,内核可以使用它来调用您的 __device__
函数.
You can then pass the h_pointFunction
as a parameter to your kernel, which can use it to call your __device__
function.
//your kernel taking your __device__ function pointer as a parameter
__global__ void kernel(pointFunction_t pPointOperation)
{
unsigned char tmp;
...
tmp = (*pPointOperation)(tmp, 150.0)
...
}
//invoke the kernel in host code, passing in your host-side __device__ function pointer
kernel<<<...>>>(h_pointFunction);
希望这有点道理.总而言之,您似乎必须将 f1 函数更改为 __device__
函数并遵循类似的过程(typedef 不是必需的,但它们确实使代码更好)来获得它作为主机端的有效函数指针传递给内核.我还建议查看 FunctionPointers CUDA 示例
Hopefully that made some sense. In all, it looks like you would have to change your f1 function to be a __device__
function and follow a similar procedure (the typedefs aren't necessary, but they do make the code nicer) to get it as a valid function pointer on the host-side to pass to your kernel. I'd also advise giving the FunctionPointers CUDA sample a look over
这篇关于CUDA 函数指针的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!