在 GPU 上实例化速度更快吗? [英] Is instancing faster on GPU?

查看:65
本文介绍了在 GPU 上实例化速度更快吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 GPU 受限的应用程序中渲染实例化几何体时是否存在任何性能提升?还是只是关于绘制调用?

Is there any performance gain present when rendering instanced geometry in a GPU-limited application? Or is it all about draw calls?

将所有对象烘焙到单个 VBO 并使用单个绘制调用渲染它们不是更好吗?假设所有对象都是静态的,顶点内存就足够了.

Isn't it better to just bake all objects to a single VBO and render them with a single draw call? Assuming all objects are static and vertex memory is enough.

推荐答案

如果一个实例模型小到完全适合 GPU 的 pre-T&L 缓存,那么它可以是一种性能GPU 的提升.但除非是这种情况,否则 GPU 将不得不为每个实例读取相同的网格数据.因此,重复 200 次的 1 个实例与 200 个单独的网格具有相同的带宽成本.

If an instance model is small enough to entirely fit within the GPU's pre-T&L cache, then it can be a performance boost for the GPU. But unless that's the case, the GPU is going to have to read the same mesh data for each instance. So 1 instance repeated 200 times would have the same bandwidth cost as 200 separate meshes.

将所有对象烘焙到单个 VBO 并使用单个绘制调用渲染它们不是更好吗?

Isn't it better to just bake all objects to a single VBO and render them with a single draw call?

没有.仅仅因为它不一定会提高您的 GPU 性能,这并不意味着您应该放弃整个事情.如果实例化适合您,那么您必须渲染相同的网格.所以这个烘焙所有对象"将重复相同的网格数据.对于您打算绘制的每个实例一次.即使不节省任何读取时间带宽,在内存中仍然是非常浪费的.

No. Just because it's not necessarily gaining you on-GPU performance, that doesn't mean you should just ditch the whole thing. If instancing was appropriate for you, then you would have to be rendering the same mesh. So this "bake all objects" would be repeating the same mesh data. Once for every instance you intend to draw. Even if you don't save any read-time bandwidth, it's still hugely wasteful in memory.

不要忽视记忆的重要性.浪费内存会导致运行时性能问题,因为它会迫使纹理超出 GPU 内存并导致抖动.

Don't discount the importance of memory. Wasting memory can lead to runtime performance problems, as it can force textures out of GPU memory and cause thrashing.

另外,它不太灵活.在一帧中,您可能只渲染 128 个实例.另一方面,您可能需要 156 个.另一方面,您可能只需要 5 个.按照您的方式,您必须保留足够的缓冲区存储来呈现最大数量的实例.有了实际的实例......你不在乎.

Plus, it's less flexible. On one frame, you might only render 128 instances. On another, you might need 156. On another, you might only need 5. With your way, you have to keep around enough buffer storage to render some maximum number of instances. With actual instancing... you don't care.

这甚至不涉及如何获取每个实例的数据.通过实例化,您可以使用 gl_InstanceID 从一些UBO/SSBO/texture 数组,或者您使用实例化数组,以便在以每个实例为基础.

And that doesn't even deal with how to get per-instance data. With instancing, you can either use gl_InstanceID to read from some UBO/SSBO/texture array, or you use instanced arrays, so that a vertex attribute is filled on a per-instance basis.

您可以在烘焙所有对象"中使用额外的整数属性模拟 gl_InstanceID,但现在您已将每个顶点增大了 4 个字节.模拟实例化数组是不可能的,因为这会极大地浪费内存.

You can emulate gl_InstanceID with an extra integer attribute in your "bake all objects", but now you've made each vertex 4 bytes bigger. Emulating instanced arrays is a non-starter, since that would be hugely wasteful in memory.

这篇关于在 GPU 上实例化速度更快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆