双打定义错误的CUDA atomicAdd [英] CUDA atomicAdd for doubles definition error

查看:40
本文介绍了双打定义错误的CUDA atomicAdd的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在以前版本的 CUDA 中,没有为双精度实现 atomicAdd,因此通常像 这里.使用新的 CUDA 8 RC,当我尝试编译包含此类功能的代码时遇到了麻烦.我猜这是因为在 Pascal 和 Compute Capability 6.0 中添加了 atomicAdd 的本机双版本,但不知何故,以前的 Compute Capabilities 并没有正确地忽略它.

In previous versions of CUDA, atomicAdd was not implemented for doubles, so it is common to implement this like here. With the new CUDA 8 RC, I run into troubles when I try to compile my code which includes such a function. I guess this is due to the fact that with Pascal and Compute Capability 6.0, a native double version of atomicAdd has been added, but somehow that is not properly ignored for previous Compute Capabilities.

下面的代码在以前的 CUDA 版本中可以正常编译和运行,但现在我得到了这个编译错误:

The code below used to compile and run fine with previous CUDA versions, but now I get this compilation error:

test.cu(3): error: function "atomicAdd(double *, double)" has already been defined

但是如果我删除我的实现,我会得到这个错误:

But if I remove my implementation, I instead get this error:

test.cu(33): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (double *, double)

我应该补充一点,我只有在使用 -arch=sm_35 或类似代码进行编译时才会看到这一点.如果我使用 -arch=sm_60 进行编译,我会得到预期的行为,即只有第一个错误,而在第二种情况下编译成功.

I should add that I only see this if I compile with -arch=sm_35 or similar. If I compile with -arch=sm_60 I get the expected behavior, i.e. only the first error, and successful compilation in the second case.

另外,它是特定于 atomicAdd 的——如果我更改名称,它会很好用.

Also, it is specific for atomicAdd -- if I change the name, it works well.

它看起来真的像一个编译器错误.其他人可以确认是这种情况吗?

It really looks like a compiler bug. Can someone else confirm that this is the case?

示例代码:

__device__ double atomicAdd(double* address, double val)
{
    unsigned long long int* address_as_ull = (unsigned long long int*)address;
    unsigned long long int old = *address_as_ull, assumed;
    do {
        assumed = old;
        old = atomicCAS(address_as_ull, assumed,
                __double_as_longlong(val + __longlong_as_double(assumed)));
    } while (assumed != old);
    return __longlong_as_double(old);
}

__global__ void kernel(double *a)
{
    double b=1.3;
    atomicAdd(a,b);
}

int main(int argc, char **argv)
{
    double *a;
    cudaMalloc(&a,sizeof(double));

    kernel<<<1,1>>>(a);

    cudaFree(a);
    return 0;
}

<小时>

我从 Nvidia 那里得到了一个认识到这个问题的答案,以下是开发人员对此的看法:


I got an answer from Nvidia who recognize this problem, and here is what the developers say about it:

CUDA 8.0 新支持的 sm_60 架构具有本机 fp64 atomicAdd 函数.由于我们的局限性工具链和CUDA语言,这个函数的声明需要即使没有专门为代码编译,也存在sm_60.这会导致您的代码出现问题,因为您还定义了一个fp64 atomicAdd 函数.

The sm_60 architecture, that is newly supported in CUDA 8.0, has native fp64 atomicAdd function. Because of the limitations of our toolchain and CUDA language, the declaration of this function needs to be present even when the code is not being specifically compiled for sm_60. This causes a problem in your code because you also define a fp64 atomicAdd function.

诸如 atomicAdd 之类的 CUDA 内置函数是实现定义的并且可以在 CUDA 版本之间更改.用户不应定义与任何 CUDA 内置函数同名的函数.我们会建议您将 atomicAdd 函数重命名为不是与任何 CUDA 内置函数相同.

CUDA builtin functions such as atomicAdd are implementation-defined and can be changed between CUDA releases. Users should not define functions with the same names as any CUDA builtin functions. We would suggest you to rename your atomicAdd function to one that is not the same as any CUDA builtin functions.

推荐答案

那种 atomicAdd 的味道是为计算能力 6.0 引入的一种新方法.您可以使用宏定义来保护您之前实现的其他计算能力

That flavor of atomicAdd is a new method introduced for compute capability 6.0. You may keep your previous implementation of other compute capabilities guarding it using macro definition

#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
<... place here your own pre-pascal atomicAdd definition ...>
#endif

这个名为架构识别宏的宏被记录在 这里:

This macro named architecture identification macro is documented here:

5.7.4.虚拟架构识别宏

5.7.4. Virtual Architecture Identification Macro

架构标识宏__CUDA_ARCH__ 在为compute_xy 编译的每个nvcc 编译阶段1 期间被分配一个三位值字符串xy0(以文字0 结尾).

The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy.

此宏可用于 GPU 函数的实现,以确定当前正在为其编译的虚拟架构.主机代码(非 GPU 代码)不能依赖它.

This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

我假设 NVIDIA 没有将它放在以前的 CC 中,以避免用户定义它的冲突,而不是迁移到 Compute Capability >= 6.x.不过,我不会认为这是一个 BUG,而是一种发布交付实践.

I assume NVIDIA did not place it for previous CC to avoid conflict for users defining it and not moving to Compute Capability >= 6.x. I would not consider it a BUG though, rather a release delivery practice.

编辑:宏保护不完整(已修复) - 这里是一个完整的示例.

EDIT: macro guard was incomplete (fixed) - here a complete example.

#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
__device__ double atomicAdd(double* a, double b) { return b; }
#endif

__device__ double s_global ;
__global__ void kernel () { atomicAdd (&s_global, 1.0) ; }


int main (int argc, char* argv[])
{
        kernel<<<1,1>>> () ;
        return ::cudaDeviceSynchronize () ;
}

编译:

$> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26

命令行(均成功):

$> nvcc main.cu -arch=sm_60
$> nvcc main.cu -arch=sm_35

您可能会发现为什么它适用于包含文件:sm_60_atomic_functions.h,如果 __CUDA_ARCH__ 低于 600,则不会声明该方法.

You may find why it works with the include file: sm_60_atomic_functions.h, where the method is not declared if __CUDA_ARCH__ is lower than 600.

这篇关于双打定义错误的CUDA atomicAdd的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆