CUDA外部类链接和未解析的extern函数在ptxas文件中 [英] CUDA external class linkage and unresolved extern function in ptxas file
问题描述
我使用CUDA,我创建了一个 int2 _
类来处理复数整数。
I'm working with CUDA and I have created an int2_
class to deal with complex integer numbers.
ComplexTypes.h
文件中的类声明如下:
Class declarations in the ComplexTypes.h
file as follows:
namespace LibraryNameSpace
{
class int2_ {
public:
int x;
int y;
// Constructors
__host__ __device__ int2_(const int,const int);
__host__ __device__ int2_();
// etc.
// Equalities with other types
__host__ __device__ const int2_& operator=(const int);
__host__ __device__ const int2_& operator=(const float);
// etc.
};
}
ComplexTypes.cpp
文件如下:
#include "ComplexTypes.h"
__host__ __device__ LibraryNameSpace::int2_::int2_(const int x_,const int y_) { x=x_; y=y_;}
__host__ __device__ LibraryNameSpace::int2_::int2_() {}
// etc.
__host__ __device__ const LibraryNameSpace::int2_& LibraryNameSpace::int2_::operator=(const int a) { x = a; y = 0.; return *this; }
__host__ __device__ const LibraryNameSpace::int2_& LibraryNameSpace::int2_::operator=(const float a) { x = (int)a; y = 0.; return *this; }
// etc.
一切都很好。在 main
(其中包括 ComplexTypes.h
)我可以处理 int2 _
数字。
Everything works well. In the main
(which includes ComplexTypes.h
) I could deal with int2_
numbers.
在 CudaMatrix.cu
文件中,我现在包括 ComplexTypes.h
并定义并正确实例化 __ global __
函数:
In the CudaMatrix.cu
file, I'm now including ComplexTypes.h
and defining and properly instantiating the __global__
function:
template <class T1, class T2>
__global__ void evaluation_matrix(T1* data_, T2* ob, int NumElements)
{
const int i = blockDim.x * blockIdx.x + threadIdx.x;
if(i < NumElements) data_[i] = ob[i];
}
template __global__ void evaluation_matrix(LibraryNameSpace::int2_*,int*,int);
CudaMatrix.cu
文件的情况似乎与 main
函数对称。然而,编译器抱怨:
The situation of the CudaMatrix.cu
file seems to be symmetric to the main
function. Nevertheless, the compiler complains:
Error 19 error : Unresolved extern function '_ZN16LibraryNameSpace5int2_aSEi' C:\Users\Documents\Project\Test\Testing_Files\ptxas simpleTest
请考虑:
- 在将实现移动到单独文件之前,在
main
file。 - 有问题的指令是
data_ [i] = ob [i]
。
- Before moving the implementation to separate files, everything was working correctly when including both declarations and implementations in the
main
file. - The problematic instruction is
data_[i] = ob[i]
.
任何人都知道发生了什么事情。
Anyone has an idea of what is going on?
推荐答案
我在上面的帖子中遵循的过程有两个问题:
The procedure I have followed in my post above has two issues:
-
ComplexTypes.cpp
filename 必须转向ComplexTypes.cu
,以便nvcc
CUDA关键字__ device __
和__ host __
。 Talonmies在他的评论中指出了这一点。实际上,在发布之前,我已经将文件名从.cpp
更改为.cu
,但编译器抱怨,显示相同的错误。
The
ComplexTypes.cpp
filename must be turned toComplexTypes.cu
so thatnvcc
could intercept the CUDA keywords__device__
and__host__
. This has been pointed out by Talonmies in his comment. Actually, before posting, I was already changing the filename from.cpp
to.cu
, but the compiler was complaining and showing the same error. Therefore, I was ingenuously stepping back;
在Visual Studio 2010中,必须使用 View - > Property Pages;配置属性 - > CUDA C / C ++ - >常用 - >生成可重定位设备代码 - >是(-rdc = true)。这是单独编译所必需的。事实上,在 NVIDIA CUDA编译器驱动程序NVCC 上,可以说:
In Visual Studio 2010, one has to use View -> Property Pages; Configuration Properties -> CUDA C/C++ -> Common -> Generate Relocatable Device Code -> Yes (-rdc=true). This is necessary for separate compilation. Indeed, at NVIDIA CUDA Compiler Driver NVCC, it is said that:
CUDA通过将设备代码嵌入主机对象来实现。在整个程序编译中,它将可执行设备代码嵌入到主机对象中。在单独的编译中,我们将可重定位设备代码嵌入到主机对象中,并运行设备链接器(nvlink)将所有设备代码链接在一起。 nvlink的输出然后由主机链接器与所有主机对象链接在一起以形成最终的可执行文件。 可重定位可执行设备代码的生成由 - relocatable-device-code = {true,false} 选项控制,可以缩短为 rdc = {true,false} 。
CUDA works by embedding device code into host objects. In whole program compilation, it embeds executable device code into the host object. In separate compilation, we embed relocatable device code into the host object, and run the device linker (nvlink) to link all the device code together. The output of nvlink is then linked together with all the host objects by the host linker to form the final executable. The generation of relocatable vs executable device code is controlled by the --relocatable-device-code={true,false} option, which can be shortened to –rdc={true,false}.
这篇关于CUDA外部类链接和未解析的extern函数在ptxas文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!