计算着色器共享内存包含工件 [英] Compute shader shared memory contains artifacts

查看:85
本文介绍了计算着色器共享内存包含工件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试编写通用的计算着色器高斯模糊实现.

I've been trying to write a general compute shader Gaussian blur implemenation.

它基本上可以工作,但是它包含的伪像会在场景静止的情况下改变每一帧.我花了过去的几个小时来调试它.我已经尽力确保不超出范围,展开所有循环,用常量替换制服,但是工件仍然存在.

It basically works, however it contains artifacts which change every frame even when the scene is static. I've spent the past few hours trying to debug this. I've gone as far as ensuring bounds aren't exceeded, unrolling all the loops, replacing uniforms with constants, yet the artifacts persist.

我已经在3种不同的机器/GPU(2个nvidia,1个intel)上使用工件对原始代码进行了测试,它们均产生相同的结果.通过使用普通C ++代码向前和向后执行的工作组来模拟代码执行的展开/恒定版本不会产生这些错误.

I've tested the original code with artifacts on 3 different machines/GPUs (2 nvidia, 1 intel) and they all produce the same results. Simulating the unrolled/constant version of the codes execution with workgroups executed forwards and backwards with plain C++ code doesn't produce these errors.

通过分配[96] [96]而不是[16] [48]的共享数组,我可以消除大部分工件.

By allocating a shared array of [96][96] instead of [16][48] I can eliminate most of the artifacts.

这使我想到我错过了一个逻辑错误,因此我设法制作了一个非常简单的着色器,该着色器仍然会在较小范围内生成错误,如果有人可以指出原因,我将不胜感激.我检查了很多文档,找不到任何不正确的内容.

This got me to the point of thinking I was missing a logic error, therefore I managed to produce a very simple shader which still produces the error on a smaller scale, I'd appreciate it if someone could point out the cause. I've checked alot of documentation and can't find anything incorrect.

分配了一个16x48浮点数的共享数组,这是3072个字节,大约是最小共享内存限制的10%.

A shared array of 16x48 floats is allocated, this is 3072 bytes, roughly 10% of the minimum shared memory limit.

着色器在16x16工作组中启动,因此每个线程将写入3个唯一的位置,并从单个唯一的位置读回

The shader is launched in 16x16 workgroups, so each thread will write to 3 unique locations, and read back from a single unique location

然后将纹理渲染为HSV,其中0-1之间的val将映射为色调0-360(红色-青色-红色),并且超出范围的值将为红色.

The texture is then renderer as HSV whereby vals between 0-1 will map to hue 0-360 (red-cyan-red), and out of bounds values will be red.

#version 430
//Execute in 16x16 sized thread blocks
layout(local_size_x=16,local_size_y=16) in;
uniform layout (r32f) restrict writeonly image2D _imageOut;
shared float hoz[16][48];
void main () 
{
    //Init shared memory with a big out of bounds value we can identify
    hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y] = 20000.0f;
    hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+16] = 20000.0f;
    hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+32] = 20000.0f;
    //Sync shared memory
    memoryBarrierShared();
    //Write the values we want to actually read back
    hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y] = 0.5f;
    hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+16] = 0.5f;
    hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+32] = 0.5f;
    //Sync shared memory
    memoryBarrierShared();
    //i=0,8,16 work
    //i=1-7,9-5,17 don't work (haven't bothered testing further
    const int i = 17;
    imageStore(_imageOut, ivec2(gl_GlobalInvocationID.xy), vec4(hoz[gl_LocalInvocationID.x][gl_LocalInvocationID.y+i]));
    //Sync shared memory (can't hurt)
    memoryBarrierShared();
}

以大于8x8的启动尺寸启动此着色器时,会在图像的受影响区域中产生伪像.

Launching this shader with launch dimensions greater than 8x8 produces artifacts in the affected area of the image.

<代码>glDispatchCompute(9,9,0);glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);

我不得不断点和脚步来捕捉它,大约花了14帧

<代码>glDispatchCompute(512/16,512/16,0);//全图为512x512glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);

同样,当以60FPS(vsync)运行时,我不得不断点和步进来捕获此错误.

推荐答案

memoryBarrierShared();

否,这只会使对其他调用的写入可见.如果您希望能够从其他调用的数据中读取数据,则必须确保所有写入操作均已发生.

No, that only makes writes visible to other invocations. You have to make sure that all of the writes have actually happened if you want to be able to read from other invocations' data.

通过 barrier 函数完成.在 memoryBarrierShared 之后的之后.

That is done with the barrier function. Which should be called after the memoryBarrierShared.

这篇关于计算着色器共享内存包含工件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆