OpenGL 统一缓冲区 std140 布局 [英] OpenGL Uniform Buffer std140 layout

查看:108
本文介绍了OpenGL 统一缓冲区 std140 布局的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 GeForce 8600 GT 上尝试通过统一块(一切都根据 GLSL#version 330")将整数数组传递给片段着色器.

I’m trying to pass an array of ints to the fragment shader via uniform block (everything is according to GLSL "#version 330") on a GeForce 8600 GT.

在我拥有的应用方面:

int MyArray[7102];
…
//filling, binding, etc
…
glBufferData(GL_UNIFORM_BUFFER, sizeof(MyArray), MyArray, GL_DYNAMIC_DRAW);

在我的片段着色器中,我根据块声明如下:

In my fragment shader I declare according block as follows:

layout (std140) uniform myblock
{
int myarray[7102];
};

问题是 glCompileShader 成功后,glLinkProgram 返回一个错误,说它不能绑定适当的存储资源.

The problem is that after successful glCompileShader the glLinkProgram returns an error saying that it can’t bind appropriate storage resource.

一些额外的事实:

1) GL_MAX_UNIFORM_BLOCK_SIZE 返回值 65536

1) GL_MAX_UNIFORM_BLOCK_SIZE returned value 65536

2) 如果我将元素数减少到 4096,它可以正常工作,并且我使用int"还是ivec4"作为数组类型没有区别.任何高于 4096 的值都会给我相同的存储错误"

2) If I lower the number of elements to 4096 it works fine and makes no difference whether I use "int" or "ivec4" as the array type. Anything above 4096 gives me the same "storage error"

3) 如果我使用共享"或打包",一切正常

3) If I use "shared" or "packed" everything works as suspected

在咨询了针对 std140 的 GLSL 3.3 规范后,我假设根据以下内容进行对齐/填充有问题:

After consulting with GLSL 3.3 specification for std140, I’m assuming that there is a problem with aligning/padding according to:

1)如果成员是一个消耗N个基本机器单元的标量,则基本对齐是 N.

"1) If the member is a scalar consuming N basic machine units, the base alignment is N.

...

4) 如果成员是标量或向量数组,则基对齐和数组stride 被设置为匹配单个数组元素的基本对齐方式,根据到规则 (1)、(2) 和 (3),并向上舍入到 vec4 的基本对齐.这数组末尾可能有填充;成员的基础偏移量数组四舍五入到基本对齐的下一个倍数."

4) If the member is an array of scalars or vectors, the base alignment and array stride are set to match the base alignment of a single array element, according to rules (1), (2), and (3), and rounded up to the base alignment of a vec4. The array may have padding at the end; the base offset of the member following the array is rounded up to the next multiple of the base alignment."

我的问题:

1) myblock"占用的空间是 7102*4=28408 字节的 4 倍,这是真的吗?IE.std140 将 myarray 的每个成员扩展为 vec4 并且实际内存使用量为 7102*4*4=113632 字节,这是问题的原因?

1) Is it true that "myblock" occupies 4 times bigger than just 7102*4=28408 bytes? I.e. std140 expands each member of myarray to vec4 and the real memory usage is 7102*4*4=113632 bytes which is the cause of the problem?

2) 它适用于共享"或打包"的原因是由于优化消除了这些差距?

2) The reason it works with "shared" or "packed" is due to the elimination of these gaps because of optimization?

3) 也许是驱动程序错误?所有事实都表明......并四舍五入到 vec4 的基本对齐"是原因,但很难接受像 int 数组这样简单的东西在内存限制方面的效率降低了 4 倍.

3) Maybe it’s a driver bug? All facts point to the "…and rounded up to the base alignment of a vec4" being the reason, but it’s quite hard to accept that something as simple as array of ints ends up being 4 times less effective in terms of memory constraints.

4) 如果不是 bug,那么在 std140 的情况下我应该如何组织和访问数组?我可以使用ivec4"来优化数据分布,但是我必须牺牲性能而不是简单的 x=myarray[i] 来做类似 x=myarray[i/4][i%4] 的事情来引用每个 ivec4 的单个元素?还是我遗漏了什么,并且有明显的解决方案?

4) If it’s not a bug, then how should I organize and access an array in case of std140? I can use "ivec4" for optimal data distribution but then instead of simple x=myarray[i] I have to sacrifice performance doing something like x=myarray[i/4][i%4] to refer to individual elements of each ivec4? Or am I missing something and there is obvious solution?

推荐答案

1) (...) 四舍五入到 vec4 的基本对齐方式?(…)

1) (…) rounded up to the base alignment of a vec4? (…)

是的.

2) 它适用于共享"或打包"的原因是由于优化消除了这些差距?

2) The reason it works with "shared" or "packed" is due to the elimination of these gaps because of optimization?

是的;只是这不是优化性能明智的.

Yes; only that this is not optimization performance wise.

3) 也许是驱动程序错误?

3) Maybe it’s a driver bug?

编辑 没有.GPU 自然使用矢量化类型.打包类型需要添加进一步的指令来对向量进行解/多路复用. 自从撰写此答案以来,GPU 架构发生了重大变化.现在的 GPU 都是单标量架构,设计重点是强超标量矢量化.

EDIT No. GPUs natually work with vectorized typed. Packing the types require to add further instructions to de-/multiplex the vectors. In the time being since writing this answer significant changes to GPU architectures happend. GPUs made these days are all single scalar architectures with the design emphased on strong superscalar vectorization.

4) 如果不是 bug,那么在 std140 的情况下我应该如何组织和访问数组?

4) If it’s not a bug, then how should I organize and access an array in case of std140?

不要对如此大的数据使用统一的缓冲区对象.将数据放入一维纹理并使用 texelFetch 对其进行索引.

Don't use uniform buffer objects for such large data. Put the data into a 1D texture and use texelFetch to index into it.

这篇关于OpenGL 统一缓冲区 std140 布局的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆