链接器错误2005和1169(乘以定义的符号)使用CUDA __device__函数时(应默认为内联) [英] Linker errors 2005 and 1169 (multiply defined symbols) when using CUDA __device__ functions (should be inline by default)
问题描述
这个问题与以下很相关:
按照这里的建议:
http://meta.stackexchange.com/questions/42343/same-question-but-not-quite
,此处为
http://meta.stackexchange.com/questions/8910/asking-a-similar-but-not-the-same -question
我问一个非常相似的问题,但我想绝对清楚我的问题和上面链接的问题之间的区别。 / p>
当包含头文件时,我从标题中得到链接器错误,头文件包含 __ device __
函数的定义,与多个源文件
这不同于Link A),因为 __ kernel __
__ device __
根据CUDA手册暗示 inline
:
在为计算能力1.x的设备编译的设备代码中,默认情况下,
__ device __
函数始终内联。然而,__ noinline __
函数限定符可以用作编译器的提示,以便在可能的情况下不内联函数(见第E.1节)。
链接B)是更相关的(和一个答案正确地指出,似乎没有被内联,不管手册说什么),但链接B)由NVIDIA发运,而不是一个自己的头,所以虽然问题很可能在我的头文件,它是最不可能在一个NVIDIA头文件。换句话说,Link B)和我的问题可能有不同的答案。
同时我发现声明一个函数为 __device__ inline
解决了问题,所以上面的只是为其他地方的文档解决方案。
开放的问题是原因
可能的解释我想出了:
- 错误
-
nvcc -arch = compute_11
不符合编译计算能力1.x设备 in nvcc - 这是MS-VS特定的,并在由NVIDIA测试的平台上工作
- 我有一个严重的误解,如何
inline
工程。可以在此处找到非cuda相关示例:使用内联函数多次定义链接器错误我的理解是caf表示编译器不应该生成函数的外部定义,所以它不应该打扰链接器其他人似乎不同意。
如果有更多insght的人能够澄清这里发生的事情,我会非常感激。
在MS VS中,以及在gcc和可能的其他编译器中(但不是在multiply defined linker error链接中引用的),inline在默认情况下意味着静态。你可以强制函数为extern内联,但除非你这样做,否则编译器不会将函数的外部定义放入目标文件,或者将其标记为安全复制。
然而,文档中没有提到CUDA __ device __
函数有效地声明为inline(因此是静态的)。文档说,该函数总是默认内联。有一个微妙的差别。
This question is very much related to:
A) How to separate CUDA code into multiple files
B) Link error LNK2005 when trying to compile several CUDA files together
Following advice from here: http://meta.stackexchange.com/questions/42343/same-question-but-not-quite and here http://meta.stackexchange.com/questions/8910/asking-a-similar-but-not-the-same-question
I am asking a very similar question but I want to be absolutely clear about where is the difference between my question and the questions linked above.
I was getting the linker errors from the title when including a header file, which contained the definition of a __device__
function, into multiple source files.
This is different from Link A) where the same errors occur with __kernel__
functions because __device__
according to the CUDA manual implies inline
:
In device code compiled for devices of compute capability 1.x, a
__device__
function is always inlined by default. The__noinline__
function qualifier however can be used as a hint for the compiler not to inline the function if possible (see Section E.1).
Link B) is more related (and one answer correctly points out that it seems not to get inlined no matter what the manual says) but link B) refers to a header shipped by NVIDIA rather than a own header so while the problem is most likely to lie within my header file, it is most unlikely to lie within a NVIDIA header file. In other words it is likely that Link B) and my questions have different answers.
In the meantime I have found out that declaring a function as __device__ inline
solves the problem so the above is only to document the solution for the rest of the world.
The open question is the reason for that behaviour.
Possible explanations I came up with:
- The manual is wrong
nvcc -arch=compute_11
does not qualify as "compiling for devices of compute capability 1.x" or there is a bug in nvcc- this is MS-VS specific and does work on platforms tested by NVIDIA
- I have a severe misconception about how
inline
works. A non cuda related example ca ne found here: Multiply defined linker error using inlined functions My understanding is the one expressed by "caf" there that "the compiler shouldn't generate an external definition of the function, so it shouldn't bother the linker" others over there seemed to disagree.
I'd greatly apprechiate if someone with more insght could clarify what is happening here.
In MS VS, as well as in gcc and possibly other compilers (but not in the one referenced by your "multiply defined linker error" link), inline implies static by default. You can force a function to be extern inline, but, unless you do, the compiler either won't place an external definition of the function into the object file, or will mark it as safe to duplicate somehow.
HOWEVER, nowhere in the documentation does it say that CUDA __device__
functions are effectively declared inline (and therefore static). The documentation says that the function is "always inlined by default". There's a subtle difference.
这篇关于链接器错误2005和1169(乘以定义的符号)使用CUDA __device__函数时(应默认为内联)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!