缓存友好的顶点定义 [英] Cache Friendly Vertex Definition

查看:201
本文介绍了缓存友好的顶点定义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写一个opengl应用程序和顶点,法线和颜色,我使用单独的缓冲区如下:



GLuint vertex_buffer, normal_buffer,color_buffer;



我的主管告诉我如果我定义一个结构体:

  struct vertex {
glm :: vec3 pos;
glm :: vec3 normal;
glm :: vec3 color;
};
GLuint vertex_buffer;

然后定义这些顶点的缓冲区,我的应用程序会得到这么快,缓存的法线和颜色将在缓存行中。



我认为定义这样的结构对性能没有太大的影响,因为定义顶点像struct将在高速缓存行中引起较少的顶点,同时将它们定义为单独的缓冲器,将导致对于高速缓存中的位置,法线和颜色具有3个不同的高速缓存线。所以,没有什么改变。真的吗?

解决方案

首先,对不同的顶点属性使用单独的缓冲区可能不是一个好的技术。



这里非常重要的因素是GPU架构。大多数(特别是现代的)GPU有多个缓存行(Input Assembler阶段,统一数据,纹理的数据),但是从多个VBO获取输入属性可能是无效的(总是简介!以交错格式定义它们有助于提高性能:





但是,如果你使用这样的结构,这不总是真的(再次,总是个人资料!) - 虽然交错的数据更加GPU友好,它需要正确对齐,并可以占用更多的内存中的空间。



但一般来说:


交错的资料格式




  • 导致较少的GPU缓存压力,因为单个顶点的顶点坐标和属性不会分散在内存中。
    它们连续适合于几个高速缓存行,而分散的
    属性可能会导致更多的高速缓存更新和驱逐。
    最坏情况场景可以是在
    a时间的每个高速缓存线的一个(属性)元素,因为远处的存储器位置,而顶点以非确定性/非连续方式拉动
    ,其中可能没有
    预测和预取脚本.GPU非常类似于
    中的CPU。


  • 外部格式,其满足不推荐使用的交错格式,其中兼容数据
    源的数据集可以直接读入映射的GPU存储器。


  • 应该是布局对齐友好只是为了
    重新实现这些交织的格式与当前的API为
    。像简单数组。混合具有不同大小/对齐要求的各种数据类型
    可能需要填充以实现GPU和CPU友好。这是我知道的唯一的缺点,从更困难的实现appart。


  • 不要阻止你指向单个attrib数组



来源



::



使用顶点数据的最佳做法



顶点规范最佳做法


I am writing an opengl application and for vertices, normals, and colors, I am using separate buffers as follows:

GLuint vertex_buffer, normal_buffer, color_buffer;

My supervisor tells me that if I define an struct like:

struct vertex {
    glm::vec3 pos;
    glm::vec3 normal;
    glm::vec3 color;
};
GLuint vertex_buffer;

and then define a buffer of these vertices, my application will gets so much faster because when the position is cached the normals and colors will be in cache line.

What I think is that defining such struct is not having that much affect on the performance because defining the vertex like the struct will cause less vertices in the cacheline while defining them as separate buffers, will cause to have 3 different cache lines for positions, normals and colors in the cache. So, nothing has been changed. Is that true?

解决方案

First of all, using separate buffers for different vertex attributes may not be a good technique.

Very important factor here is GPU architecture. Most (especially modern) GPUs have multiple cache lines (data for Input Assembler stage, uniforms, textures), but fetching input attributes from multiple VBOs can be inefficient anyway (always profile!). Defining them in interleaved format can help improve performance:

And that's what you would get, if you used such struct.

However, that's not always true (again, always profile!) - although interleaved data is more GPU-friendly, it needs to be properly aligned and can take significantly more space in memory.

But, in general:

Interleaved data formats:

  • Cause less GPU cache pressure, because the vertex coordinate and attributes of a single vertex aren't scattered all over in memory. They fit consecutively into few cache lines, whereas scattered attributes could cause more cache updates and therefore evictions. The worst case scenario could be one (attribute) element per cache line at a time because of distant memory locations, while vertices get pulled in a non-deterministic/non-contiguous manner, where possibly no prediction and prefetching kicks in. GPUs are very similar to CPUs in this matter.

  • Are also very useful for various external formats, which satisfy the deprecated interleaved formats, where datasets of compatible data sources can be read straight into mapped GPU memory. I ended up re-implementing these interleaved formats with the current API for exactly those reasons.

  • Should be layouted alignment friendly just like simple arrays. Mixing various data types with different size/alignment requirements may need padding to be GPU and CPU friendly. This is the only downside I know of, appart from the more difficult implementation.

  • Do not prevent you from pointing to single attrib arrays in them for sharing.

Source

Further reads:

Best Practices for Working with Vertex Data

Vertex Specification Best Practices

这篇关于缓存友好的顶点定义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆