布尔实现的atomicCAS [英] atomicCAS for bool implementation
问题描述
我试图弄清楚答案(现已删除)中是否存在有关Cuda实施的错误像 atomicCAS
一样用于 bool
s。答案中的代码(重新格式化):
I'm trying to figure out is there a bug in the answer (now deleted) about the implementation of Cuda-like atomicCAS
for bool
s. The code from the answer (reformatted):
static __inline__ __device__ bool atomicCAS(bool *address, bool compare, bool val)
{
unsigned long long addr = (unsigned long long)address;
unsigned pos = addr & 7; // byte position within the unsigned long long
int *int_addr = (int *)(addr - pos); // int-aligned address
int old = *int_addr, assumed, ival;
do
{
assumed = old;
if(val)
ival = old | (1 << (8 * pos));
else
ival = old & (~((0xFFU) << (8 * pos)));
old = atomicCAS(int_addr, assumed, ival);
} while(assumed != old);
return (bool)(old & ((0xFFU) << (8 * pos)));
}
根据文档, atomicCAS
应该设置为 * address
到(* address == compare?val:* address)
,但是在上面的实现中,比较$
According to the documentation, atomicCAS
should set *address
to (*address == compare ? val : *address)
, but in the implementation above compare
argument is never used!
我用来重现该错误的代码:
The code I use to reproduce the bug:
#include <cstdio>
// atomicCAS definition here
__device__ bool b;
__global__ void kernel()
{
b = false;
atomicCAS(&b, true, true); // `(b == true ? true : b)`, where b is false equals to false
printf("%d\n", b); // b is false => expected output is 0
}
int main()
{
kernel<<<1, 1>>>();
cudaDeviceSynchronize();
}
预期输出为0,但实际输出为1。
The expected output is 0, but the actual output is 1.
关于如何修复它,我有一个建议,但不是100%确保它是线程安全的:
I have a suggestion about how to fix it but am not 100% sure it's thread-safe:
static __inline__ __device__ bool atomicCAS(bool *address, bool compare, bool val)
{
unsigned long long addr = (unsigned long long)address;
unsigned pos = addr & 3; // byte position within the int
int *int_addr = (int *)(addr - pos); // int-aligned address
int old = *int_addr, assumed, ival;
do
{
if(*address != compare) // If we expected that bool to be different, then
break; // stop trying to update it and just return it's current value
assumed = old;
if(val)
ival = old | (1 << (8 * pos));
else
ival = old & (~((0xFFU) << (8 * pos)));
old = atomicCAS(int_addr, assumed, ival);
} while(assumed != old);
return (bool)(old & ((0xFFU) << (8 * pos)));
}
我的问题是
- 答案的第一个代码示例中是否存在错误?如果存在,
- 最后一个代码示例是否可以线程安全地对其进行修复?
推荐答案
非常感谢@RobertCrovella;第一个代码示例确实包含一个错误,第二个示例确实修复了该错误,但是它不是线程安全的(有关详细信息,请参阅问题注释)。线程安全修复程序:
Many many thanks to @RobertCrovella; the first code sample does contain a bug, the second does fix it, but is not thread-safe (see question comments for details). The thread-safe fix:
static __inline__ __device__ bool atomicCAS(bool *address, bool compare, bool val)
{
unsigned long long addr = (unsigned long long)address;
unsigned pos = addr & 3; // byte position within the int
int *int_addr = (int *)(addr - pos); // int-aligned address
int old = *int_addr, assumed, ival;
bool current_value;
do
{
current_value = (bool)(old & ((0xFFU) << (8 * pos)));
if(current_value != compare) // If we expected that bool to be different, then
break; // stop trying to update it and just return it's current value
assumed = old;
if(val)
ival = old | (1 << (8 * pos));
else
ival = old & (~((0xFFU) << (8 * pos)));
old = atomicCAS(int_addr, assumed, ival);
} while(assumed != old);
return current_value;
}
这篇关于布尔实现的atomicCAS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!