现代英特尔CPU L3缓存如何组织? [英] How are the modern Intel CPU L3 caches organized?

查看:309
本文介绍了现代英特尔CPU L3缓存如何组织?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于CPU现在是多核并拥有自己的L1 / L2高速缓存,我很想知道L3高速缓存是如何组织的,因为它由多个内核共享。我可以想象,如果我们有4个核心,那么L3缓存将包含4页的数据,每个页面对应于特定核心正在引用的内存区域。假设我是正确的,就目前而言?例如,它可以将这些页面中的每一个划分为子页面。这样,当多个线程在同一个内核上运行时,每个线程可以在子页面之一中找到其数据。我只是想出了这个办法,所以我很想对自己进行学习,了解幕后真正发生的事情。任何人都可以分享他们的见解或为我提供链接,这些链接可以使我摆脱我的无知吗?

Given that CPUs are now multi-core and have their own L1/L2 caches, I was curious as to how the L3 cache is organized given that its shared by multiple cores. I would imagine that if we had, say, 4 cores, then the L3 cache would contain 4 pages worth of data, each page corresponding to the region of memory that a particular core is referencing. Assuming I'm somewhat correct, is that as far as it goes? It could, for example, divide each of these pages into sub-pages. This way when multiple threads run on the same core each thread may find their data in one of the sub-pages. I'm just coming up with this off the top of my head so I'm very interested in educating myself on what is really going on underneath the scenes. Can anyone share their insights or provide me with a link that will cure me of my ignorance?

非常感谢。

推荐答案

有一个(切片) )单插槽芯片中的L3高速缓存和多个L2高速缓存(每个实际物理核心一个)。
L3高速缓存以64字节大小的段(高速缓存行)高速缓存数据,并且有特殊的高速缓存L3和不同的L2 / L1之间(以及NUMA / ccNUMA多路系统中的多个芯片之间)的一致性协议;它跟踪哪个高速缓存行是实际的,并在几个高速缓存之间共享,这是刚刚修改的(应该从其他高速缓存中取消)。某些协议(高速缓存行可能的状态和状态转换): https://en.wikipedia.org/wiki/ MESI_protocol https://en.wikipedia.org/wiki/MESIF_protocol ,< a href = https://en.wikipedia.org/wiki/MOESI_protocol rel = noreferrer> https://en.wikipedia.org/wiki/MOESI_protocol

There is single (sliced) L3 cache in single-socket chip, and several L2 caches (one per real physical core). L3 cache caches data in segments of size of 64 bytes (cache lines), and there is special Cache coherence protocol between L3 and different L2/L1 (and between several chips in the NUMA/ccNUMA multi-socket systems too); it tracks which cache line is actual, which is shared between several caches, which is just modified (and should be invalidated from other caches). Some of protocols (cache line possible states and state translation): https://en.wikipedia.org/wiki/MESI_protocol, https://en.wikipedia.org/wiki/MESIF_protocol, https://en.wikipedia.org/wiki/MOESI_protocol

在较早的芯片(Core 2时代)中,缓存一致性被被监听总线,现在可以在目录的帮助下进行检查。

In older chips (epoch of Core 2) cache coherence was snooped on shared bus, now it is checked with help of directory.

在现实生活中,L3不仅是单个的,还被切成若干片,每个片都有高速访问端口。有一些根据物理地址选择切片的方法,该方法允许多核系统随时执行许多访问(每个访问将由未记录的方法应用于某些切片;当两个内核使用相同的物理地址时,它们的访问将由同一切片服务或按切片进行缓存一致性协议检查)。
关于L3缓存片的信息在几篇文章中被颠倒:

In real life L3 is not just "single" but sliced into several slices, each of them having high-speed access ports. There is some method of selecting the slice based on physical address, which allow multicore system to do many accesses at every moment (each access will be directed by undocumented method to some slice; when two cores uses same physical address, their accesses will be served by same slice or by slices which will do cache coherence protocol checks). Information about L3 cache slices was reversed in several papers:

  • https://cmaurice.fr/pdf/raid15_maurice.pdf Reverse Engineering Intel Last-Level Cache Complex Addressing Using Performance Counters
  • https://eprint.iacr.org/2015/690.pdf Systematic Reverse Engineering of Cache Slice Selection in Intel Processors
  • https://arxiv.org/pdf/1508.03767.pdf Cracking Intel Sandy Bridge’s Cache Hash Function

随着最近的芯片,程序员能够对应用程序之间的L3缓存缓存分配技术(v4系列): https://software.intel.com/zh-CN/articles/introduction-到缓存分配技术 https://software.intel.com/zh-cn/articles/introduction-to-code-and-data-prioritization-with-usage-models https://danluu.com/intel-cat/ https://lwn.net/Articles/659161/

With recent chips programmer has ability to partition the L3 cache between applications "Cache Allocation Technology" (v4 Family): https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology https://software.intel.com/en-us/articles/introduction-to-code-and-data-prioritization-with-usage-models https://danluu.com/intel-cat/ https://lwn.net/Articles/659161/

这篇关于现代英特尔CPU L3缓存如何组织?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆