CUDA atomicAdd导致双精度定义错误 [英] CUDA atomicAdd for doubles definition error

查看:662
本文介绍了CUDA atomicAdd导致双精度定义错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在以前的CUDA版本中,atomicAdd并未实现双打,因此通常会像

In previous versions of CUDA, atomicAdd was not implemented for doubles, so it is common to implement this like here. With the new CUDA 8 RC, I run into troubles when I try to compile my code which includes such a function. I guess this is due to the fact that with Pascal and Compute Capability 6.0, a native double version of atomicAdd has been added, but somehow that is not properly ignored for previous Compute Capabilities.

下面的代码用于在以前的CUDA版本中进行编译和运行,但是现在我收到此编译错误:

The code below used to compile and run fine with previous CUDA versions, but now I get this compilation error:

test.cu(3): error: function "atomicAdd(double *, double)" has already been defined

但是,如果我删除实现,则会收到此错误:

But if I remove my implementation, I instead get this error:

test.cu(33): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (double *, double)

我应该补充一点,即只有在我使用-arch=sm_35或类似版本进行编译时,我才会看到此消息.如果我使用-arch=sm_60进行编译,则会得到预期的行为,即仅出现第一个错误,而在第二种情况下成功编译.

I should add that I only see this if I compile with -arch=sm_35 or similar. If I compile with -arch=sm_60 I get the expected behavior, i.e. only the first error, and successful compilation in the second case.

而且,它专用于atomicAdd -如果我更改名称,它会很好地工作.

Also, it is specific for atomicAdd -- if I change the name, it works well.

它看起来确实像是一个编译器错误.其他人可以确认是否是这种情况吗?

It really looks like a compiler bug. Can someone else confirm that this is the case?

示例代码:

__device__ double atomicAdd(double* address, double val)
{
    unsigned long long int* address_as_ull = (unsigned long long int*)address;
    unsigned long long int old = *address_as_ull, assumed;
    do {
        assumed = old;
        old = atomicCAS(address_as_ull, assumed,
                __double_as_longlong(val + __longlong_as_double(assumed)));
    } while (assumed != old);
    return __longlong_as_double(old);
}

__global__ void kernel(double *a)
{
    double b=1.3;
    atomicAdd(a,b);
}

int main(int argc, char **argv)
{
    double *a;
    cudaMalloc(&a,sizeof(double));

    kernel<<<1,1>>>(a);

    cudaFree(a);
    return 0;
}


我从Nvidia那里得到了一个答案,他意识到了这个问题,这就是开发人员对此的评价:


I got an answer from Nvidia who recognize this problem, and here is what the developers say about it:

在CUDA 8.0中新受支持的sm_60体系结构具有 原生fp64 atomicAdd函数.由于我们的局限性 工具链和CUDA语言,此功能的声明需要 即使没有专门为这些代码编译代码,也要存在 sm_60.这会在您的代码中引起问题,因为您还定义了一个 fp64 atomicAdd函数.

The sm_60 architecture, that is newly supported in CUDA 8.0, has native fp64 atomicAdd function. Because of the limitations of our toolchain and CUDA language, the declaration of this function needs to be present even when the code is not being specifically compiled for sm_60. This causes a problem in your code because you also define a fp64 atomicAdd function.

CUDA内置函数(例如atomicAdd)是实现定义的 并且可以在CUDA版本之间进行更改.用户不应定义 与任何CUDA内置函数同名的函数.我们会 建议您将atomicAdd函数重命名为不是 与任何CUDA内置函数相同.

CUDA builtin functions such as atomicAdd are implementation-defined and can be changed between CUDA releases. Users should not define functions with the same names as any CUDA builtin functions. We would suggest you to rename your atomicAdd function to one that is not the same as any CUDA builtin functions.

推荐答案

atomicAdd的风格是为计算能力6.0引入的一种新方法.您可以使用宏定义保留以前其他计算功能的实现来保护它

That flavor of atomicAdd is a new method introduced for compute capability 6.0. You may keep your previous implementation of other compute capabilities guarding it using macro definition

#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
<... place here your own pre-pascal atomicAdd definition ...>
#endif

5.7.4.虚拟架构识别宏

5.7.4. Virtual Architecture Identification Macro

在每个为compute_xy进行编译的nvcc编译阶段1期间,为体系结构标识宏__CUDA_ARCH__分配了一个三位数的值字符串xy0(以立即数0结尾).

The architecture identification macro __CUDA_ARCH__ is assigned a three-digit value string xy0 (ending in a literal 0) during each nvcc compilation stage 1 that compiles for compute_xy.

此宏可用于实现GPU功能,以确定当前为其编译的虚拟体系结构.主机代码(非GPU代码)不得依赖于它.

This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

我认为NVIDIA不会将其放置在以前的CC中,以避免用户对其进行定义而引起的冲突,并且不会迁移到Compute Capability> =6.x.不过,我不会认为这是一个错误,而是发布发布的做法.

I assume NVIDIA did not place it for previous CC to avoid conflict for users defining it and not moving to Compute Capability >= 6.x. I would not consider it a BUG though, rather a release delivery practice.

编辑:宏卫队不完整(已修复)-这是一个完整的示例.

EDIT: macro guard was incomplete (fixed) - here a complete example.

#if !defined(__CUDA_ARCH__) || __CUDA_ARCH__ >= 600
#else
__device__ double atomicAdd(double* a, double b) { return b; }
#endif

__device__ double s_global ;
__global__ void kernel () { atomicAdd (&s_global, 1.0) ; }


int main (int argc, char* argv[])
{
        kernel<<<1,1>>> () ;
        return ::cudaDeviceSynchronize () ;
}

编译方式:

$> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26

命令行(均成功):

$> nvcc main.cu -arch=sm_60
$> nvcc main.cu -arch=sm_35

您可能会发现为什么它可以用于包含文件sm_60_atomic_functions.h,如果__CUDA_ARCH__小于600,则不会声明该方法.

You may find why it works with the include file: sm_60_atomic_functions.h, where the method is not declared if __CUDA_ARCH__ is lower than 600.

这篇关于CUDA atomicAdd导致双精度定义错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆