为什么没有为双打实现 atomicAdd? [英] Why has atomicAdd not been implemented for doubles?

查看:33
本文介绍了为什么没有为双打实现 atomicAdd?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么双打的 atomicAdd() 没有作为 CUDA 4.0 或更高版本的一部分明确实现?

Why hasnt atomicAdd() for doubles been implemented explicitly as a part of CUDA 4.0 or higher?

来自 CUDA 编程指南 4.1<的附录 F 第 97 页/a> 以下版本atomicAdd 已实现.

From the appendix F Page 97 of the CUDA programming guide 4.1 the following versions of atomicAdd have been implemented.

int atomicAdd(int* address, int val);
unsigned int atomicAdd(unsigned int* address,
                       unsigned int val);
unsigned long long int atomicAdd(unsigned long long int* address,
                                 unsigned long long int val);
float atomicAdd(float* address, float val)

同样的页面继续给出一个用于双打的 atomicAdd 的小实现,如下所示我刚刚开始在我的项目中使用它.

The same page goes on to give a small implementation of atomicAdd for doubles as follows which I have just started using in my project.

__device__ double atomicAdd(double* address, double val)
{
    unsigned long long int* address_as_ull =
                             (unsigned long long int*)address;
    unsigned long long int old = *address_as_ull, assumed;
    do {
        assumed = old;
old = atomicCAS(address_as_ull, assumed,
                        __double_as_longlong(val +
                               __longlong_as_double(assumed)));
    } while (assumed != old);
    return __longlong_as_double(old);
}

为什么不将上述代码定义为 CUDA 的一部分?

Why not define the above code as a part of CUDA ?

推荐答案

从 CUDA 8 开始,双精度 atomicAdd() 在 CUDA 中实现,硬件支持 SM_6X (Pascal) GPU.

As of CUDA 8, double-precision atomicAdd() is implemented in CUDA with hardware support in SM_6X (Pascal) GPUs.

目前,没有 CUDA 设备在硬件中支持 atomicAdd for double. 如您所述,它可以在 方面实现atomicCAS 在 64 位整数上,但有一个不平凡的性能成本.

Currently, no CUDA devices support atomicAdd for double in hardware. As you noted, it can be implemented in terms of atomicCAS on 64-bit integers, but there is a non-trivial performance cost for that.

因此,CUDA 软件团队选择记录正确的实现作为开发人员的一个选项,而不是使其成为 CUDA 标准库的一部分.这样,开发人员就不会在不知不觉中选择他们不了解的性能成本.

Therefore, the CUDA software team chose to document a correct implementation as an option for developers, rather than make it part of the CUDA standard library. This way developers are not unknowingly opting in to a performance cost they don't understand.

旁白:我认为这个问题不应该以没有建设性"的方式结束.我认为这是一个完全有效的问题,+1.

Aside: I don't think this question should be closed as "not constructive". I think it's a perfectly valid question, +1.

这篇关于为什么没有为双打实现 atomicAdd?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆