CUDA __device__未解析的extern函数 [英] CUDA __device__ Unresolved extern function

查看:245
本文介绍了CUDA __device__未解析的extern函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解如何解除单独头文件中的CUDA __ device __ 代码。

I am trying to understand how to decouple CUDA __device__ codes in separate header files.

档案。

档案:1:int2.cuh

#ifndef INT2_H_
#define INT2_H_

#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

__global__ void kernel();
__device__ int k2(int k);

int launchKernel(int dim);

#endif /* INT2_H_ */

int2.cu

#include "int2.cuh"
#include "cstdio"

__global__ void kernel() {
    int tid = threadIdx.x;
    printf("%d\n", k2(tid));
}

__device__ int k2(int i) {
    return i * i;
}

int launchKernel(int dim) {
    kernel<<<1, dim>>>();
    cudaDeviceReset();
    return 0;
}

档案3:CUDASample.cu

include <stdio.h>
#include <stdlib.h>
#include "int2.cuh"
#include "iostream"

using namespace std;

static const int WORK_SIZE = 256;

__global__ void sampleCuda() {
    int tid = threadIdx.x;
//    printf("%d\n", k2(tid)); //Can not call k2
    printf("%d\n", tid * tid);
}

int main(void) {

    int var;
    var = launchKernel(16);

    kernel<<<1, 16>>>();
    cudaDeviceReset();

    sampleCuda<<<1, 16>>>();
    cudaDeviceReset();

    return 0;
}

代码工作文件。我可以调用 sampleCuda()内核(在同一文件中),调用C函数 launchKernel()文件),并直接调用 kernel()(在其他文件中)。

The code works file. I can call the sampleCuda() kernel (in same file), call the C function launchKernel() (in other file), and call kernel() directly (in other file).

来自 sampleCuda()内核的 __ device __ 那么它会显示以下错误。但是, kernel()中可调用相同的函数。

The problem I am facing is calling the __device__ function from sampleCuda() kernel. then it shows the following error. However, the same function is callable in kernel().

10:58:11 **** Incremental Build of configuration Debug for project CUDASample ****
make all 
Building file: ../src/CUDASample.cu
Invoking: NVCC Compiler
/Developer/NVIDIA/CUDA-6.5/bin/nvcc -G -g -O0 -gencode arch=compute_20,code=sm_20  -odir "src" -M -o "src/CUDASample.d" "../src/CUDASample.cu"
/Developer/NVIDIA/CUDA-6.5/bin/nvcc -G -g -O0 --compile --relocatable-device-code=false -gencode arch=compute_20,code=compute_20 -gencode arch=compute_20,code=sm_20  -x cu -o  "src/CUDASample.o" "../src/CUDASample.cu"
../src/CUDASample.cu(18): warning: variable "var" was set but never used

../src/CUDASample.cu(8): warning: variable "WORK_SIZE" was declared but never referenced

../src/CUDASample.cu(18): warning: variable "var" was set but never used

../src/CUDASample.cu(8): warning: variable "WORK_SIZE" was declared but never referenced

ptxas fatal   : Unresolved extern function '_Z2k2i'
make: *** [src/CUDASample.o] Error 255

10:58:14 Build Finished (took 2s.388ms)


推荐答案

在调用它的 __ global __ 的单独编译单元中定义了 __ device __ 函数。您需要通过添加 -dc 标志或将您的定义移动到同一单位来显式启用可重定位设备代码模式。

The issue is that you defined a __device__ function in separate compilation unit from __global__ that calls it. You need to either explicitely enable relocatable device code mode by adding -dc flag or move your definition to the same unit.

nvcc 文档:


- device-c | -dc 将每个.c / .cc / .cpp / .cxx / .cu输入文件编译成包含可重定位设备代码的对象文件。它等价于
- relocatable-device-code = true - compile

--device-c|-dc Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that contains relocatable device code. It is equivalent to --relocatable-device-code=true --compile.

请参阅有关CUDA C ++设备代码的单独编译和链接,了解更多信息。

See Separate Compilation and Linking of CUDA C++ Device Code for more information.

这篇关于CUDA __device__未解析的extern函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆