CUDA atomicAdd导致双精度定义错误 [英] CUDA atomicAdd for doubles definition error
问题描述
在以前的CUDA版本中,atomicAdd并未实现双打,因此通常会像
In previous versions of CUDA, atomicAdd was not implemented for doubles, so it is common to implement this like here. With the new CUDA 8 RC, I run into troubles when I try to compile my code which includes such a function. I guess this is due to the fact that with Pascal and Compute Capability 6.0, a native double version of atomicAdd has been added, but somehow that is not properly ignored for previous Compute Capabilities.
下面的代码用于在以前的CUDA版本中进行编译和运行,但是现在我收到此编译错误:
The code below used to compile and run fine with previous CUDA versions, but now I get this compilation error:
test.cu(3): error: function "atomicAdd(double *, double)" has already been defined
但是,如果我删除实现,则会收到此错误:
But if I remove my implementation, I instead get this error:
test.cu(33): error: no instance of overloaded function "atomicAdd" matches the argument list
argument types are: (double *, double)
我应该补充一点,即只有在我使用-arch=sm_35
或类似版本进行编译时,我才会看到此消息.如果我使用-arch=sm_60
进行编译,则会得到预期的行为,即仅出现第一个错误,而在第二种情况下成功编译.
I should add that I only see this if I compile with -arch=sm_35
or similar. If I compile with -arch=sm_60
I get the expected behavior, i.e. only the first error, and successful compilation in the second case.
而且,它专用于atomicAdd
-如果我更改名称,它会很好地工作.
Also, it is specific for atomicAdd
-- if I change the name, it works well.
它看起来确实像是一个编译器错误.其他人可以确认是否是这种情况吗?
It really looks like a compiler bug. Can someone else confirm that this is the case?
示例代码:
__device__ double atomicAdd(double* address, double val)
{
unsigned long long int* address_as_ull = (unsigned long long int*)address;
unsigned long long int old = *address_as_ull, assumed;
do {
assumed = old;
old = atomicCAS(address_as_ull, assumed,
__double_as_longlong(val + __longlong_as_double(assumed)));
} while (assumed != old);
return __longlong_as_double(old);
}
__global__ void kernel(double *a)
{
double b=1.3;
atomicAdd(a,b);
}
int main(int argc, char **argv)
{
double *a;
cudaMalloc(&a,sizeof(double));
kernel<<<1,1>>>(a);
cudaFree(a);
return 0;
}
我从Nvidia那里得到了一个答案,他意识到了这个问题,这就是开发人员对此的评价:
I got an answer from Nvidia who recognize this problem, and here is what the developers say about it:
在CUDA 8.0中新受支持的sm_60体系结构具有 原生fp64 atomicAdd函数.由于我们的局限性 工具链和CUDA语言,此功能的声明需要 即使没有专门为这些代码编译代码,也要存在 sm_60.这会在您的代码中引起问题,因为您还定义了一个 fp64 atomicAdd函数.
The sm_60 architecture, that is newly supported in CUDA 8.0, has native fp64 atomicAdd function. Because of the limitations of our toolchain and CUDA language, the declaration of this function needs to be present even when the code is not being specifically compiled for sm_60. This causes a problem in your code because you also define a fp64 atomicAdd function.
CUDA内置函数(例如atomicAdd)是实现定义的 并且可以在CUDA版本之间进行更改.用户不应定义 与任何CUDA内置函数同名的函数.我们会 建议您将atomicAdd函数重命名为不是 与任何CUDA内置函数相同.
CUDA builtin functions such as atomicAdd are implementation-defined and can be changed between CUDA releases. Users should not define functions with the same names as any CUDA builtin functions. We would suggest you to rename your atomicAdd function to one that is not the same as any CUDA builtin functions.
推荐答案
atomicAdd的风格是为计算能力6.0引入的一种新方法.您可以使用宏定义保留以前其他计算功能的实现来保护它
That flavor of atomicAdd is a new method introduced for compute capability 6.0. You may keep your previous implementation of other compute capabilities guarding it using macro definition
#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
<... place here your own pre-pascal atomicAdd definition ...>
#endif