了解CPU如何确定将哪些内容加载到缓存中 [英] Understanding how the CPU decides what gets loaded into cache memory

查看：143 发布时间：2020/5/21 20:37:30 c++ caching optimization

本文介绍了了解CPU如何确定将哪些内容加载到缓存中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

让我们说一台计算机具有64k的L1缓存和512k的L2缓存.

Lets say a computer has 64k of L1 cache and 512k of L2 cache.

程序员已在主存储器中创建/填充了例如10mb的数据数组(例如3d模型的顶点/索引数据).

The programmer has created/populated an array of say 10mb of data in main memory (e.g. the vertex / index data of a 3d model).

该数组可能包含一系列结构，如:

The array might contain a series of structs like:

struct x
{
  vec3 pos;
  vec3 normal;
  vec2 texcoord;
};

接下来，程序员必须对所有这些数据执行某些操作，例如一次常规计算，然后再将数据传递给GPU.

Next the programmer has to perform some operation on all this data, e.g. one time normal computation, before passing the data over to the GPU.

CPU如何确定如何将数据加载到二级缓存中?

How does the CPU decide how data gets loaded into L2 cache?

程序员如何检查任何给定体系结构的高速缓存行的大小?

How can the programmer check what size a cache line is for any given architecture?

程序员如何确保对数据进行组织以使其适合高速缓存行?

How can the programmer ensure that data is organised so that it fits into cache lines?

数据对齐字节边界是唯一可以帮助完成此过程的事情吗?

Is data alignment to byte boundaries the only thing that can be done to aid this process?

程序员可以采取什么措施来最大程度地减少高速缓存未命中率?

What can the programmer do to minimize cache misses ?

可用哪些配置工具可以帮助可视化Windows和Linux平台的优化过程?

What profiling tools are available that'll help visualize the optimization process for the windows and linux platforms?

推荐答案

这里有很多问题，所以我将简短地回答.

There are a lot of questions here so I will keep the answers brief.

CPU如何确定如何将数据加载到二级缓存中?

How does the CPU decide how data gets loaded into L2 cache?

无论您使用什么，都会被加载. L2的行为与L1相同，区别在于L2的行为更多，并且由于行数较大且设置的关联性较小，因此混叠(可能导致过早驱逐)更为常见.某些CPU只会向L2加载从L1中推出的数据，但这对程序员没有太大影响.

Whatever you use, gets loaded. L2 behaves the same as L1 except there's more of it, and aliasing (which may result in premature eviction) is more common because of larger lines and less set associativity. Some CPUs only load L2 with data that is getting pushed out of L1, but it doesn't make much difference to the programmer.

大多数MMU都具有用于未缓存内存的功能，但这是用于设备驱动程序的.我不记得曾经见过禁用L2而不禁用L1的选项.没有缓存，就没有性能.

Most MMUs have a facility for uncached memory, but this is for device drivers. I don't recall ever seeing an option to disable L2 without disabling L1. With no caching, you get no performance.

程序员如何检查任何给定体系结构的高速缓存行的大小?

How can the programmer check what size a cache line is for any given architecture?

请查阅用户手册.某些操作系统提供诸如sysctl的查询工具.

By consulting the user manual. Some operating systems provide a query facility like sysctl.

程序员如何确保对数据进行组织以使其适合高速缓存行?

How can the programmer ensure that data is organised so that it fits into cache lines?

关键思想是空间局部性.通过相同的内部循环同时访问的数据应进入相同的数据结构.最佳组织是使该结构适合高速缓存行，并将其与高速缓存行的大小对齐.

The key idea is spatial locality. Data which is accessed at the same time, by the same inner loop, should go into the same data structure. The optimal organization is to fit that structure onto a cache line and align it to the cache line size.

除非您仔细地将探查器用作指导，否则请不要麻烦.

Don't go to the trouble unless you are carefully using your profiler as a guide.

数据对齐字节边界是唯一可以帮助完成此过程的事情吗?

Is data alignment to byte boundaries the only thing that can be done to aid this process?

否，另一部分是避免用多余的数据填充缓存.如果某些字段仅将由其他某种算法使用，则它们将浪费本算法运行时的缓存空间.但是，您无法始终优化所有内容，并且重组数据结构需要进行编程工作.

No, the other part is avoiding filling the cache with extraneous data. If some fields are only going to be used by some other algorithm, then they are wasting cache space while the present algorithm runs. But you can't optimize everything all the time, and reorganizing the data structures takes programming effort.

程序员可以采取什么措施来最大程度地减少高速缓存未命中率?

What can the programmer do to minimize cache misses ?

使用实际数据进行配置，并将过多的未命中视为错误.

Profile using real-world data, and treat excessive misses as a bug.

可用哪些配置工具可以帮助可视化Windows和Linux平台的优化过程?

What profiling tools are available that'll help visualize the optimization process for the windows and linux platforms?

Cachegrind非常好，但是使用虚拟机.英特尔V-Tune会使用您的实际硬件，不管您的硬件是好是坏.我没有用过后者.

Cachegrind is very nice but uses a virtual machine. Intel V-Tune uses your actual hardware, for better or worse. I haven't used the latter.

这篇关于了解CPU如何确定将哪些内容加载到缓存中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

了解CPU如何确定将哪些内容加载到缓存中 [英] Understanding how the CPU decides what gets loaded into cache memory

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

了解CPU如何确定将哪些内容加载到缓存中 [英] Understanding how the CPU decides what gets loaded into cache memory

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭