非交错顶点缓冲区 DirectX11 [英] Non-interleaved vertex buffers DirectX11

查看:45
本文介绍了非交错顶点缓冲区 DirectX11的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我的顶点位置是共享的,但我的法线和 UV 不是(以保留硬边等),是否可以在 DirectX11 中使用非交错缓冲区来解决此内存表示,以便我可以使用索引用它缓冲吗?或者我应该坚持在交错缓冲区中使用重复的顶点位置?

交错和非交错顶点缓冲区之间是否存在任何性能问题?谢谢!

解决方案

如何

有几种方法.我将描述一个最简单的.

只需创建单独的顶点缓冲区:

ID3D11Buffer* 位置;ID3D11Buffer* texcoords;ID3D11Buffer* 法线;

创建输入布局元素,为每个组件增加InputSlot成员:

{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 },{ TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },{正常",0,DXGI_FORMAT_R32G32B32_FLOAT,2,D3D11_APPEND_ALIGNED_ELEMENT,D3D11_INPUT_PER_VERTEX_DATA,0},//^//输入槽

将缓冲区绑定到它们的插槽(最好一次性全部完成):

ID3D11Buffer** vbs = {positions, texcoords, normals};unsigned int strides[] = {/*strides go here*/};unsigned int offsets [] = {/*offsets go here*/};m_Context->IASetVertexBuffers(0, 3, vbs, strides, offsets);

像往常一样画画.您不需要更改 HLSL 代码(HLSL 会认为它只有一个缓冲区).

请注意,代码片段是即时编写的,可能包含错误.

您可以改进这种方法,按更新率组合缓冲区:如果 texcoordsnormals 从未改变,则合并它们.

截至表现

这完全是关于参考文献的位置:数据越接近,访问越快.>

在大多数情况下,交错缓冲区为 GPU 端(即渲染)提供(到目前为止)更高的性能:对于每个顶点,每个属性彼此靠近.但是单独的缓冲区提供了更快的 CPU 访问:数组是连续的,每个下一个数据都接近前一个.

因此,总体而言,性能问题取决于您写入缓冲区的频率.如果您的限制因素是 CPU 写入,请坚持使用单独的缓冲区.如果没有,那就去单身吧.

你怎么知道?只有一种方式 - profile.CPU 端和 GPU 端(通过 GPU 供应商的图形调试器/分析器).

其他因素

最佳实践是限制 CPU 写入,因此,如果您发现自己受到缓冲区更新的限制,您可能需要重新审视您的方法.如果我们有 500 fps,我们是否需要更新每一帧的缓冲区?如果您将缓冲区更新率降低到每秒 30-60 次(从帧更新中解除缓冲区更新的绑定),用户将不会看到差异.因此,如果您的更新策略合理,您可能永远不会受到 CPU 限制,而最佳方法是经典交错.

您还可以考虑重新设计您的数据管道,甚至以某种方式离线准备数据(我们称之为烘焙"),这样您就无需处理非交错缓冲区.这也很合理.

减少内存占用还是提高性能?

内存与性能的权衡.这是永恒的问题.重复内存以利用交错?还是不行?

答案是...这取决于".您正在编写新的 CryEngine,目标是拥有 GB 内存的顶级 GPU?或者您正在为移动平台的嵌入式系统编程,其中内存资源缓慢且有限?1 兆字节的内存值得麻烦吗?或者你有巨大的模型,每个 100 MB?我们不知道.

一切由你决定.但请记住:没有免费的糖果.如果你发现内存经济值得性能损失,那就去做吧.配置文件和比较以确保.

希望它以某种方式有所帮助.快乐编码!=)

If my vertex positions are shared, but my normals and UVs are not (to preserve hard edges and the likes), is it possible to use non-interleaved buffers in DirectX11 to solve this memory representation, such that I could use indice buffer with it? Or should I stick with duplicated vertex positions in an interleaved buffer?

And is there any performance concerns between interleaved and non-interleaved vertex buffers? Thank you!

解决方案

How to

There are several ways. I'll describe the simplest one.

Just create separate vertex buffers:

ID3D11Buffer* positions;
ID3D11Buffer* texcoords;
ID3D11Buffer* normals;

Create input layout elements, incrementing InputSlot member for each component:

{ "POSITION",  0,  DXGI_FORMAT_R32G32B32_FLOAT,  0, 0,                            D3D11_INPUT_PER_VERTEX_DATA, 0 },
{ "TEXCOORD",  0,  DXGI_FORMAT_R32G32_FLOAT,     1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
{ "NORMAL",    0,  DXGI_FORMAT_R32G32B32_FLOAT,  2, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
                                             //  ^
                                             // InputSlot

Bind buffers to their slots (better all in one shot):

ID3D11Buffer** vbs = {positions, texcoords, normals};
unsigned int strides[] = { /*strides go here*/ };
unsigned int offsets [] = { /*offsets go here*/ };
m_Context->IASetVertexBuffers(0, 3, vbs, strides, offsets );

Draw as usual. You don't need to change HLSL code (HLSL will think as it have single buffer).

Note, that code snippets was written on-the-fly and can contain mistakes.

Edit: you can improve this approach, combining buffers by update rate: if texcoords and normals never changed, merge them.

As of performance

It is all about locality of references: the closer data, the faster access.

Interleaved buffer, in most cases, gives (by far) more performance for GPU side (i.e. rendering): for each vertex each attribute near each other. But separate buffers gives faster CPU access: arrays are contiguous, each next data is near previous.

So, overall, performance concerns depends on how often you writing to buffers. If your limiting factor is CPU writes, stick to separate buffers. If not, go for single one.

How will you know? Only one way - profile. Both, CPU side, and GPU side (via Graphics debugger/profiler from your GPU's vendor).

Another factors

The best practice is to limit CPU writes, so, if you will find that you are limited by buffer updating, you probably need to re-view your approach. Do we need to update buffer each frame if we have 500 fps? User won't see difference if you reduce buffer update rate to 30-60 times per second (unbind buffer update from frame update). So, if your updating strategy is reasonable, you will likely never be CPU-limited and best approach is classic interleaving.

You can also consider re-designing your data pipeline, or even somehow prepare data offline (we call it "baking"), so you will not need to cope with non-interleaved buffers. That will be quite reasonable too.

Reduce memory footprint or increase performance?

Memory-to-performance tradeoff. This is the eternal question. Duplicate memory to take advantages of interleaving? Or not?

Answer is... "that depends". You are programming new CryEngine, targeting top GPUs with gigabytes of memory? Or you're programming for embedded systems of mobile platform, where memory resources slow and limited? Does 1 megabyte memory worth hassle at all? Or you have huge models, 100 MB each? We don't know.

It's all up to you to decide. But remember: there are no free candies. If you'll find memory economy worth performance loss, do it. Profile and compare to be sure.

Hope it helps somehow. Happy coding! =)

这篇关于非交错顶点缓冲区 DirectX11的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆