如何正确链接cuda头文件与设备功能? [英] How to properly link cuda header file with device functions?
问题描述
我试图解耦我的代码,有点失败。编译错误:
I'm trying to decouple my code a bit and something fails. Compilation error:
error: calling a __host__ function("DecoupledCallGpu") from a __global__ function("kernel") is not allowed
代码摘录:
main.c (调用cuda主机函数):
main.c (has a call to cuda host function):
#include "cuda_compuations.h"
...
ComputeSomething(&var1,&var2);
...
cuda_computations.cu 主机主函数,包括具有设备操作的标题):
cuda_computations.cu (has kernel, host master functions and includes header which has device unctions):
#include "cuda_computations.h"
#include "decoupled_functions.cuh"
...
__global__ void kernel(){
...
DecoupledCallGpu(&var_kernel);
}
void ComputeSomething(int *var1, int *var2){
//allocate memory and etc..
...
kernel<<<20,512>>>();
//cleanup
...
}
decoupled_functions.cuh ::
decoupled_functions.cuh:
#ifndef _DECOUPLEDFUNCTIONS_H_
#define _DECOUPLEDFUNCTIONS_H_
void DecoupledCallGpu(int *var);
#endif
decoupled_functions.cu:
#include "decoupled_functions.cuh"
__device__ void DecoupledCallGpu(int *var){
*var=0;
}
#endif
汇编:
nvcc -g --ptxas-options = -v -arch = sm_30 -c cuda_computations.cu -o cuda_computations.o -lcudart
nvcc -g --ptxas-options=-v -arch=sm_30 -c cuda_computations.cu -o cuda_computations.o -lcudart
问题:为什么从主机函数调用 DecoupledCallGpu
,而不是内核
Question: why is it that the DecoupledCallGpu
is called from host function and not a kernel as it was supposed to?
PS:如果你需要我可以共享它背后的实际代码。
P.S.: I can share the actual code behind it if you need me to.
推荐答案
将 __ device __
装饰器添加到 decoupled_functions.cuh
中的原型。
Add the __device__
decorator to the prototype in decoupled_functions.cuh
. That should take care of the error message you are seeing.
然后您需要使用您的模块之间单独的编译和链接。因此,与使用 -dc
编译 -c
不同。您的链接命令将需要修改。一个基本示例是此处。
Then you'll need to use separate compilation and linking amongst your modules. So instead of compiling with -c
compile with -dc
. And your link command will need to be modified. A basic example is here.
您的问题有点混乱:
问题:为什么DecoupledCallGpu从主机函数调用而不是内核,因为它是应该的?
Question: why is it that the DecoupledCallGpu is called from host function and not a kernel as it was supposed to?
我不知道你是否绊倒英语如果在这里有一个误会。实际的错误消息指出:
I can't tell if you're tripping over english or if there is a misunderstanding here. The actual error message states:
错误:调用
__ host __
函数(DecoupledCallGpu
error: calling a
__host__
function("DecoupledCallGpu") from a__global__
function("kernel") is not allowed
void DecoupledCallGpu(int *var);
此原型表示CUDA C中的未装饰函数, 等效于 __ host__
(仅)装饰函数:
This prototype indicates an undecorated function in CUDA C, and such functions are equivalent to __host__
(only) decorated functions:
__host__ void DecoupledCallGpu(int *var);
编译单元不知道decoupled_functions.cu中实际上是什么。
That compilation unit has no knowledge of what is actually in decoupled_functions.cu.
因此,当你有这样的内核代码:
Therefore, when you have kernel code like this:
__global__ void kernel(){ //<- __global__ function
...
DecoupledCallGpu(&var_kernel); //<- appears as a __host__ function to compiler
}
编译器认为尝试从 __ global __
函数调用 __ host __
函数,这是非法的。
the compiler thinks you are trying to call a __host__
function from a __global__
function, which is illegal.
这篇关于如何正确链接cuda头文件与设备功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!