在同一全局内存位置并发写入 [英] Concurrent writes in the same global memory location

查看:18
本文介绍了在同一全局内存位置并发写入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个块,每个块在大小为 512 的共享内存数组中都有一些整数.如何检查每个块中的数组是否包含零作为元素?

I have several blocks, each having some integers in a shared memory array of size 512. How can I check if the array in every block contains a zero as an element?

我正在做的是创建一个驻留在全局内存中的数组.该数组的大小取决于块的数量,它被初始化为 0.因此,如果共享内存数组包含零,则每个块都会写入 a[blockid] = 1.

What I am doing is creating an array that resides in the global memory. The size of this array depends on the number of blocks, and it is initialized to 0. Hence every block writes to a[blockid] = 1 if the shared memory array contains a zero.

我的问题是当我在一个块中同时写入多个线程时.也就是说,如果共享内存中的数组包含多个零,那么几个线程将写入a[blockid] = 1.这会产生任何问题吗?

My problem is when I have several threads in a single block writing at the same time. That is, if the array in the shared memory contains more than one zero, then several threads will write a[blockid] = 1. Would this generate any problem?

也就是说,如果2个线程将完全相同的值写入全局内存中完全相同的数组元素会不会有问题?

推荐答案

在 CUDA 执行模型中,不能保证 每个 从同一块中的线程同时写入到同一全局内存位置将会成功.至少一个写入会起作用,但编程模型不能保证会发生多少个写入事务,或者如果执行多个事务,它们将以什么顺序发生.

In the CUDA execution model, there are no guarantees that every simultaneous write from threads in the same block to the same global memory location will succeed. At least one write will work, but it isn't guaranteed by the programming model how many write transactions will occur, or in what order they will occur if more than one transaction is executed.

如果这是一个问题,那么更好的方法(从正确性的角度来看)是每个块中只有一个线程执行全局写入.您可以使用以原子方式设置的共享内存标志或归约操作来确定是否应设置该值.您选择哪个可能取决于可能有​​多少个零.零越多,减少的吸引力就越大.CUDA 包括扭曲级别的 __any()__all() 运算符,它们可以在几行代码中内置到非常有效的布尔缩减中.

If this is a problem, then a better approach (from a correctness point of view), would be to have only one thread from each block do the global write. You can either use a shared memory flag set atomically or a reduction operation to determine whether the value should be set. Which you choose might depend on how many zeros there are likely to be. The more zeroes there are, the more attractive the reduction will be. CUDA includes warp level __any() and __all() operators which can be built into a very efficient boolean reduction in a few lines of code.

这篇关于在同一全局内存位置并发写入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆