定义atomicAdd函数在CUDA中不起作用 [英] Define atomicAdd function doesn't work in CUDA

查看:849
本文介绍了定义atomicAdd函数在CUDA中不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于CUDA 2.0x没有 atomicAdd()函数用于double,因此我将atomicAdd()函数定义为 atomicAddd )根据此问题,

As CUDA 2.0x doesn't have atomicAdd() function for double, then I define the 'atomicAdd()' function as atomicAddd() according to this question,

Why has atomicAdd not been implemented for doubles?

这里是设备功能的代码:

Here is the code for the device function:

__device__ double atomicAddd(double* address, double val)
{
    unsigned long long int* address_as_ull =
                             (unsigned long long int*)address;
    unsigned long long int old = *address_as_ull, assumed;
    do {
        assumed = old;
old = atomicCAS(address_as_ull, assumed,
                        __double_as_longlong(val +
                               __longlong_as_double(assumed)));
    } while (assumed != old);
    return __longlong_as_double(old);
}

代码与函数名称相同。

这是我的内核的一部分:

Here is part of my kernel:

__global__ void test(double *dev_like, double *dev_sum){
    __shared__ double lik;
    // some code to compute lik;
    // copy lik back to global dev_lik;
    dev_like[blockIdx.x] = lik;

    // add lik to dev_sum
    if(threadIdx.x == 0){
        atomicAddd(dev_sum, loglik);
    }

}

$ c> dev_lik 回到主机并将它们添加到 sum ,我还复制 dev_sum 返回主机 sum1 。我的理解是 sum 应该与 sum1 相同,这里是我的主机代码打印它们。 p>

After I copy the dev_lik back to host and add them to sum, and I also copy the dev_sum back to host sum1. My understanding is that the sum should be the same of sum1, here is my host code to print them.

for (int m = 0; m < 100; ++m){
        if(sum[m] == sum1[m]){
            std::cout << "True" << std::endl;
        }
        else{
            std::cout << "False" << "\t" << std::setprecision(20) << sum[m] << "\t" << std::setprecision(20) << sum1[m] << std::endl;
        }
    }

我得到的结果如下:

True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
False   -1564.0205173292260952  -1564.0205173292256404
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
False   -1563.4011523293495429  -1563.4011523293493156
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True


$ b b

一些结果显示 False sum sum1 非常小,不知道是什么问题。

Some results show False but the difference between sum and sum1 is very small, have no idea what is the problem.

推荐答案

与数学加法不同,浮点加法不是因为涉及的舍入步骤。在需要原子操作的情况下,操作的顺序不是确定性的。因此,非确定性舍入误差是不可避免的。

Unlike mathematical addition, floating point addition is not associative because of the rounding step involved. In situations where atomic operations are necessary, the order of operations is not deterministic. So nondeterministic rounding errors are inevitable.

这篇关于定义atomicAdd函数在CUDA中不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆