原子函数是否真的使变量在CUDA中不稳定? [英] Does atomic functions really make variables volatile in CUDA?

查看:155
本文介绍了原子函数是否真的使变量在CUDA中不稳定?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个非常简单的代码,请求线程0更新一个全局变量,而其他线程保持读取该变量。但我发现其他线程并不真正得到该值。


代码在这里,很简单。任何人都可以给我任何建议如何解决它?
非常感谢




  __ global__ void addKernel(int * c)
{
int i = threadIdx.x;
int j = 0;
if(i == 0)
{
while(* c <2000){
int temp = * c;
printf(* c =%d \\\
,* c);
atomicCAS(c,temp,temp + 1);
}
} else {
while(* c <1000)
{
j ++;
}
}

}

解决方案

我想做一个类比:想象一下原子操作是互斥体:为了一个程序被明确定义,两个线程访问共享资源必须两者同意使用互斥体专有地访问资源。如果其中一个线程在没有持有mutex的情况下访问资源,那么结果是未定义的。



原子同样如此:如果你决定处理一个特定的位置在内存中作为一个原子变量,那么所有访问该位置的线程都应该同意并将其作为程序的含义。您应该通过原子加载和存储操作它,而不是非原子操作和原子操作的组合。



换句话说, :

  atomicCAS(c,temp,temp + 1); 

包含原子加载比较存储。结果指令将一直向下到全局内存以加载 c ,进行比较,并一直向下到全局内存

c $ c> while(* c <2000)

编译器(和硬件)不知道 c 可能已被另一个线程修改。因此,不是一路下降到全局内存,它只是从最快的可用缓存读取。编译器甚至可以把变量放在一个寄存器中,因为它在当前线程中没有看到任何其他人修改它。



你想要的是(虚构的):

  while(atomicLoad(c)< 2000)

但是据我所知,在撰写本文时,CUDA没有这样的结构。



在这方面, volatile 限定符帮助:它告诉编译器不优化变量,它是可以从外部来源。这将触发每次读取变量的负载,虽然我不知道这个负载绕过所有的缓存。在实践中,它可能工作,但在理论上我不认为你应该依赖它。此外,这也将禁用对该变量的任何优化(例如常数传播或将变量提升到寄存器以获得更好的性能)。



您可能想尝试以下hack(我没有尝试过):

  while(atomicAdd(c,0)< 2000)

这会发出 从全局内存加载的原子指令,因此应该看到 c 的最近值。但是,它还引入了一个(在这种情况下是无用的)原子存储。


I've written a very simple code ask thread 0 to update a global variable while other threads keep reading that variable.But I found other threads don't really get the value.

Code is here, it is quite simple. Could anyone give me any suggestion how to fix it? Thanks a lot

__global__ void addKernel(int *c)
{
int i = threadIdx.x;
int j = 0;
if (i == 0)
{
    while(*c < 2000){
        int temp = *c;
        printf("*c = %d\n",*c);
        atomicCAS(c,temp, temp+1);
    }       
}else{
    while(*c < 1000)
    {
        j++;
    }
}

}

解决方案

I'd like to make an analogy: imagine for a second that atomic operations are mutexes: for a program to be well-defined, two threads accessing a shared resource must both agree to use the mutex to access the resource exclusively. If one of the threads accesses the resource without first holding the mutex, the result is undefined.

The same thing is true for atomics: if you decide to treat a particular location in memory as an atomic variable, then all threads accessing that location should agree and treat it as such for your program to have meaning. You should only be manipulating it through atomic loads and stores, not a combination of non-atomic and atomic operations.

In other words, this:

atomicCAS(c,temp, temp+1);

Contains an atomic load-compare-store. The resulting instruction will go all the way down to global memory to load c, do the comparison, and go all the way down to global memory to store the new value.

But this:

while(*c < 2000)

Is not atomic by any means. The compiler (and the hardware) has no idea that c may have been modified by another thread. So instead of going all the way down to global memory, it will simply read from the fastest available cache. Possibly the compiler will even put the variable in a register, because it doesn't see anyone else modifying it in the current thread.

What you would want is something like (imaginary):

while (atomicLoad(c) < 2000)

But to the best of my knowledge there is no such construct in CUDA at the time of writing.

In this regard, the volatile qualifier may help: it tells the compiler to not optimize the variable, and consider it as "modifiable from external sources". This will trigger a load for every read of the variable, although I am not sure this load bypasses all the caches. In practice, it may work, but in theory I don't think you should rely on it. Besides, this will also disable any optimizations on that variable (such as constant propagation or promoting the variable to a register for better performance).

You may want to try the following hack (I haven't tried it):

while(atomicAdd(c, 0) < 2000)

This will emit an atomic instruction that does load from global memory, and therefore should see the most recent value of c. However, it also introduces an (useless in this case) atomic store.

这篇关于原子函数是否真的使变量在CUDA中不稳定?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆