哪个是最缓存友好的? [英] Which is most cache friendly?

查看:214
本文介绍了哪个是最缓存友好的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要掌握面向数据的设计,以及如何使用缓存进行最佳编程。基本上有两种情况,我不能完全决定哪一个更好,为什么 - 是更好的有一个对象的向量或几个向量与对象原子数据?

I am trying to get a good grip on data oriented design and how to program best with the cache in mind. There's basically two scenarios that I cannot quite decide which is better and why - is it better to have a vector of objects, or several vectors with the objects atomic data?

A )对象示例的向量

A) Vector of objects example

struct A
{
    GLsizei mIndices;
    GLuint mVBO;
    GLuint mIndexBuffer;
    GLuint mVAO;

    size_t vertexDataSize;
    size_t normalDataSize;
};

std::vector<A> gMeshes;

for_each(gMeshes as mesh)
{
    glBindVertexArray(mesh.mVAO);
    glDrawElements(GL_TRIANGLES, mesh.mIndices, GL_UNSIGNED_INT, 0);
    glBindVertexArray(0);

    ....
}



<原子数据

B) Vectors with the atomic data

std::vector<GLsizei> gIndices;
std::vector<GLuint> gVBOs;
std::vector<GLuint> gIndexBuffers;
std::vector<GLuint> gVAOs;
std::vector<size_t> gVertexDataSizes;
std::vector<size_t> gNormalDataSizes;

size_t numMeshes = ...;

for (index = 0; index++; index < numMeshes)
{
    glBindVertexArray(gVAOs[index]);
    glDrawElements(GL_TRIANGLES, gIndices[index], GL_UNSIGNED_INT, 0);
    glBindVertexArray(0);

    ....
}

内存高效和缓存友好导致更少的缓存未命中和更好的性能,为什么?

Which one is more memory efficient and cache friendly resulting in less cache misses and better performance, and why?

推荐答案

缓存的工作原理如下:


  • 如果数据已经在缓存中,那么访问速度很快

  • 如果数据不在缓存中,那么会产生成本,但是整个缓存行(或页面,如果我们谈到RAM vs swap文件而不是cache vs RAM)所以接近错过的地址的访问不会错过。

  • 如果你很幸运,那么内存子系统将检测顺序访问和预取数据,它认为你需要。

那么天真的问题是:


  1. 有多少缓存未命中? - B胜,因为在A中每个记录获取一些未使用的数据,而在B中,在迭代结束时只取小的舍入误差。因此,为了访问所有必要的数据,B提取较少的高速缓存行,假设有大量的记录。如果记录数量不重要,那么缓存性能可能与代码的性能几乎没有什么关系,或者与代码的性能无关,因为使用足够小数据量的程序会发现它一直在缓存中。

  2. 是访问顺序? - 在这两种情况下都是这样,虽然在B情况下可能更难发现,因为有两个交错序列,而不是一个。

所以,我希望B可以更快的这个代码。但是:

So, I would sort of expect B to be faster for this code. However:


  • 如果这是对数据的唯一访问,那么您可以通过删除大多数数据成员加速A, code> struct 。所以这样做。据推测,实际上它不是对程序中数据的唯一访问,其他访问可能会以两种方式影响性能:实际使用的时间,以及是否使用所需的数据填充缓存。

  • 我期望和实际发生的事情往往是不同的事情,如果你有任何能力来测试它,有一点点依靠投机。在最好的情况下,顺序访问意味着在任一代码中没有高速缓存未命中。测试性能需要没有特殊工具(虽然它们可以使它更容易),只是一个时钟与秒针。在一个捏,从你的手机充电器时尚摆。

  • 有一些我忽略的复杂性。根据硬件,如果你不幸运的B,那么在最低的缓存级别,你可以发现访问一个向量正在驱逐对另一个向量的访问,因为相应的内存刚好在缓存中使用相同的位置。这将导致每个记录两次高速缓存缺失 。这只会发生在所谓的直接映射缓存。 双向缓存或更好将通过允许两个向量的块共存而保存一天,即使他们在缓存中的第一偏好位置是相同的。我不认为PC硬件一般使用直接映射缓存,但我不知道肯定,我不知道很多关于GPU。

  • if this is the only access to the data, then you could speed up A by removing most of the data members from the struct. So do that. Presumably in fact it is not the only access to the data in your program, and the other accesses might affect performance in two ways: the time they actually take, and whether they populate the cache with the data you need.
  • what I expect and what actually happens are frequently different things, and there is little point relying on speculation if you have any ability to test it. In the best case, the sequential access means that there are no cache misses in either code. Testing performance requires no special tool (although they can make it easier), just a clock with a second hand. At a pinch, fashion a pendulum from your phone charger.
  • there are some complications I have ignored. Depending on hardware, if you're unlucky with B then at the lowest cache level you could find that the accesses to one vector are evicting the accesses to the other vector, because the corresponding memory just happens to use the same location in cache. This would cause two cache misses per record. This will only happen on what's called "direct-mapped cache". "Two-way cache" or better would save the day, by allowing chunks of both vectors to co-exist even if their first preference location in cache is the same. I don't think that PC hardware generally uses direct-mapped cache, but I don't know for sure and I don't know much about GPUs.

这篇关于哪个是最缓存友好的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆