链接器错误2005和1169(乘以定义的符号)使用CUDA __device__函数时(应默认为内联) [英] Linker errors 2005 and 1169 (multiply defined symbols) when using CUDA __device__ functions (should be inline by default)

查看:391
本文介绍了链接器错误2005和1169(乘以定义的符号)使用CUDA __device__函数时(应默认为内联)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题与以下很相关:



A)如何将CUDA代码分为多个文件



B)链接错误LNK2005时试图编译几个CUDA文件在一起



按照这里的建议:
http://meta.stackexchange.com/questions/42343/same-question-but-not-quite
,此处为
http://meta.stackexchange.com/questions/8910/asking-a-similar-but-not-the-same -question



我问一个非常相似的问题,但我想绝对清楚我的问题和上面链接的问题之间的区别。 / p>

当包含头文件时,我从标题中得到链接器错误,头文件包含 __ device __ 函数的定义,与多个源文件



这不同于Link A),因为 __ kernel __ __ device __ 根据CUDA手册暗示 inline


在为计算能力1.x的设备编译的设备代码中,默认情况下, __ device __ 函数始终内联。然而, __ noinline __ 函数限定符可以用作编译器的提示,以便在可能的情况下不内联函数(见第E.1节)。


链接B)是更相关的(和一个答案正确地指出,似乎没有被内联,不管手册说什么),但链接B)由NVIDIA发运,而不是一个自己的头,所以虽然问题很可能在我的头文件,它是最不可能在一个NVIDIA头文件。换句话说,Link B)和我的问题可能有不同的答案。



同时我发现声明一个函数为 __device__ inline 解决了问题,所以上面的只是为其他地方的文档解决方案。



开放的问题是原因



可能的解释我想出了:




  • 错误

  • nvcc -arch = compute_11 不符合编译计算能力1.x设备 in nvcc

  • 这是MS-VS特定的,并在由NVIDIA测试的平台上工作

  • 我有一个严重的误解,如何 inline 工程。可以在此处找到非cuda相关示例:使用内联函数多次定义链接器错误我的理解是caf表示编译器不应该生成函数的外部定义,所以它不应该打扰链接器其他人似乎不同意。



如果有更多insght的人能够澄清这里发生的事情,我会非常感激。

解决方案

在MS VS中,以及在gcc和可能的其他编译器中(但不是在multiply defined linker error链接中引用的),inline在默认情况下意味着静态。你可以强制函数为extern内联,但除非你这样做,否则编译器不会将函数的外部定义放入目标文件,或者将其标记为安全复制。



然而,文档中没有提到CUDA __ device __ 函数有效地声明为inline(因此是静态的)。文档说,该函数总是默认内联。有一个微妙的差别。


This question is very much related to:

A) How to separate CUDA code into multiple files

B) Link error LNK2005 when trying to compile several CUDA files together

Following advice from here: http://meta.stackexchange.com/questions/42343/same-question-but-not-quite and here http://meta.stackexchange.com/questions/8910/asking-a-similar-but-not-the-same-question

I am asking a very similar question but I want to be absolutely clear about where is the difference between my question and the questions linked above.

I was getting the linker errors from the title when including a header file, which contained the definition of a __device__ function, into multiple source files.

This is different from Link A) where the same errors occur with __kernel__ functions because __device__ according to the CUDA manual implies inline:

In device code compiled for devices of compute capability 1.x, a __device__ function is always inlined by default. The __noinline__ function qualifier however can be used as a hint for the compiler not to inline the function if possible (see Section E.1).

Link B) is more related (and one answer correctly points out that it seems not to get inlined no matter what the manual says) but link B) refers to a header shipped by NVIDIA rather than a own header so while the problem is most likely to lie within my header file, it is most unlikely to lie within a NVIDIA header file. In other words it is likely that Link B) and my questions have different answers.

In the meantime I have found out that declaring a function as __device__ inline solves the problem so the above is only to document the solution for the rest of the world.

The open question is the reason for that behaviour.

Possible explanations I came up with:

  • The manual is wrong
  • nvcc -arch=compute_11 does not qualify as "compiling for devices of compute capability 1.x" or there is a bug in nvcc
  • this is MS-VS specific and does work on platforms tested by NVIDIA
  • I have a severe misconception about how inline works. A non cuda related example ca ne found here: Multiply defined linker error using inlined functions My understanding is the one expressed by "caf" there that "the compiler shouldn't generate an external definition of the function, so it shouldn't bother the linker" others over there seemed to disagree.

I'd greatly apprechiate if someone with more insght could clarify what is happening here.

解决方案

In MS VS, as well as in gcc and possibly other compilers (but not in the one referenced by your "multiply defined linker error" link), inline implies static by default. You can force a function to be extern inline, but, unless you do, the compiler either won't place an external definition of the function into the object file, or will mark it as safe to duplicate somehow.

HOWEVER, nowhere in the documentation does it say that CUDA __device__ functions are effectively declared inline (and therefore static). The documentation says that the function is "always inlined by default". There's a subtle difference.

这篇关于链接器错误2005和1169(乘以定义的符号)使用CUDA __device__函数时(应默认为内联)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆