使用 imageAtomicCompSwap 的 GLSL 每像素自旋锁 [英] GLSL per-pixel spinlock using imageAtomicCompSwap

查看:31
本文介绍了使用 imageAtomicCompSwap 的 GLSL 每像素自旋锁的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

OpenGL 红皮书版本 9 (OpenGL 4.5) 示例 11.13 是简单的每像素互斥.它在 do {} while() 循环中使用 imageAtomicCompSwap 来获取每个像素的锁,以防止在对应于同一像素的像素着色器调用之间同时访问共享资源坐标.

OpenGL red book version 9 (OpenGL 4.5) example 11.13 is Simple Per-Pixel Mutex. It uses imageAtomicCompSwap in a do {} while() loop to take a per-pixel lock to prevent simultaneous access to a shared resouce between pixel shader invocations corresponding to the same pixel coordinate.

layout (binding = 0, r32ui) uniform volatile coherent uimage2D lock_image;

void main(void)
{
    ivec2 pos = ivec2(gl_FragCoord.xy);

    // spinlock - acquire
    uint lock_available;
    do {
        lock_available = imageAtomicCompSwap(lock_image, pos, 0, 1);
    } while (lock_available != 0);

    // do some operations protected by the lock
    do_something();

    // spinlock - release
    imageStore(lock_image, pos, uvec4(0));
}

此示例在 Nvidia 和 AMD GPU 上都产生了 APPCRASH.我知道在这两个平台上,PS 职业无法相互独立地进行 - 一组线程以锁步方式执行,共享控制流(Nvidia 术语中 32 个线程的扭曲").所以可能会导致死锁.

This example results in APPCRASH on both Nvidia and AMD GPUs. I know on these two platforms PS vocations are unable to progress indepenently of each other - a sub-group of threads is executed in lockstep, sharing the control flow (a "warp" of 32 threads in Nvidia's terminology). So it may result in deadlock.

然而,OpenGL 规范没有提到以锁步执行的线程".它只提到相同着色器类型调用的相对顺序未定义.".在这个例子中,为什么我们不能使用原子操作imageAtomicCompSwap来确保不同PS调用之间的独占访问?这是否意味着 Nvidia 和 AMD GPU 不符合 OpenGL 规范?

However, there is nowhere that OpenGL spec mentioned "threads executed in lockstep". It only mentioned "The relative order of invocations of the same shader type are undefined.". As in this example, why can we not use atomic operation imageAtomicCompSwap to ensure exclusive access between different PS invocations? Does this mean Nvidia and AMD GPU not conform with OpenGL spec?

推荐答案

在这个例子中,为什么我们不能使用原子操作imageAtomicCompSwap来保证不同PS调用之间的独占访问?

As in this example, why can we not use atomic operation imageAtomicCompSwap to ensure exclusive access between different PS invocations?

如果您使用原子操作来锁定对像素的访问,则您依赖于相对顺序的一个方面:所有线程最终都会向前推进.也就是说,您假设任何在锁上旋转的线程都不会饿死拥有执行资源锁的线程.持有锁的线程最终会向前推进并释放它.

If you are using atomic operations to lock access to a pixel, you are relying on one aspect of relative order: that all threads will eventually make forward progress. That is, you assume that any thread spinning on a lock will not starve the thread that has the lock of its execution resources. That threads holding the lock will eventually make forward progress and release it.

但是由于执行的相对顺序是未定义,因此不能保证任何顺序.因此,您的代码无法运行.任何依赖于单个着色器阶段调用之间排序的任何方面的代码都无法工作(除非有特定的保证).

But since the relative order of execution is undefined, there is no guarantee of any of that. And therefore, your code cannot work. Any code which relies on any aspect of ordering between the invocations of a single shader stage cannot work (unless there are specific guarantees in place).

这正是 ARB_fragment_shader_interlock 存在的原因.

This is precisely why ARB_fragment_shader_interlock exists.

话虽如此,即使有前进的保证,您的代码仍然会被破坏.

That being said, even if there were guarantees of forward progress, your code would still be broken.

您使用非原子操作来释放锁.您应该使用原子集操作.

You use a non-atomic operation to release the lock. You should be using an atomic set operation.

另外,正如其他人指出的那样,如果原子比较/交换的返回值为零,则您需要继续旋转.请记住:所有 原子函数从图像返回原始 值.因此,如果它以原子方式读取的原始值不是 0,那么它比较为 false 并且您没有锁定.

Plus, as others have pointed out, you need to continue to spin if the return value from the atomic compare/swap is not zero. Remember: all atomic functions return the original value from the image. So if the original value it atomically read is not 0, then it compared false and you don't have the lock.

现在,您的代码仍将是规范的 UB.但它更有可能奏效.

Now, your code will still be UB by the spec. But it's more likely to work.

这篇关于使用 imageAtomicCompSwap 的 GLSL 每像素自旋锁的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆