MOESI 缓存一致性协议相对于 MESI 有什么好处? [英] What is the benefit of the MOESI cache coherency protocol over MESI?

查看:35
本文介绍了MOESI 缓存一致性协议相对于 MESI 有什么好处?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道 MOESI 相对于 MESI 缓存一致性协议有什么好处,以及哪种协议目前更受现代架构的青睐.如果成本不允许,收益通常不会转化为实施.MOESI 相对于 MESI 的定量性能结果也很不错.

I was wondering what benefits MOESI has over the MESI cache coherency protocol, and which protocol is currently favored for modern architectures. Oftentimes benefits don't translate to implementation if the costs don't allow it. Quantitative performance results of MOESI over MESI would be nice to see also.

推荐答案

AMD 使用 MOESI,Intel 使用 MESIF.(我不知道非 x86 缓存的详细信息.)

AMD uses MOESI, Intel uses MESIF. (I don't know about non-x86 cache details.)

MOESI 允许直接在缓存之间发送脏缓存行,而不是写回共享的外部缓存,然后从那里读取.链接的维基文章有更多细节,但它基本上是关于共享脏数据.Owned 状态跟踪哪个缓存负责写回脏数据.

MOESI allows sending dirty cache lines directly between caches instead of writing back to a shared outer cache and then reading from there. The linked wiki article has a bit more detail, but it's basically about sharing dirty data. The Owned state keeps track of which cache is responsible for writing back dirty the data.

MESIF 允许缓存F转发干净的副本缓存行到另一个缓存,而不是其他缓存必须从内存中重新读取它以获得另一个共享副本.(英特尔因为 Nehalem 已经为所有内核使用了一个大型共享 L3 缓存,因此在检查内存之前,所有请求最终都由一个 L3 缓存支持,但这是针对一个套接字上的所有内核.转发适用于多套接字系统中的套接字.在 Skylake-AVX512 之前,大型共享 L3 缓存是包含的.intel core i7 处理器使用哪种缓存映射技术?)

MESIF allows caches to Forward a copy of a clean cache line to another cache, instead of other caches having to re-read it from memory to get another Shared copy. (Intel since Nehalem already uses a single large shared L3 cache for all cores, so all requests are ultimately backstopped by one L3 cache before checking memory anyway, but that's for all cores on one socket. Forwarding apply between sockets in a multi-socket system. Until Skylake-AVX512, the large shared L3 cache was inclusive. Which cache mapping technique is used in intel core i7 processor?)

维基百科的 MESIF 文章(上面链接)对 MOESI 和 MESIF 进行了一些比较.

Wikipedia's MESIF article (linked above) has some comparison between MOESI and MESIF.

在某些情况下,AMD 在 2 个内核之间共享相同的缓存线具有更低的延迟.例如,请参阅此核心间延迟图,了解 Ryzen 与. 四核英特尔与众核英特尔(环形总线:Broadwell)与 Skylake-X(最差).

AMD in some cases has lower latency for sharing the same cache line between 2 cores. For example, see this graph of inter-core latency for Ryzen vs. quad-core Intel vs. many-core Intel (ring bus: Broadwell) vs. Skylake-X (worst).

显然,英特尔和 AMD 设计之间存在许多影响内核间延迟的其他差异,例如英特尔使用环形总线或网格,而 AMD 使用交叉开关/小集群的全方位设计.(例如,Ryzen 具有共享 L3 的 4 个核心集群.这就是为什么 Ryzen 的核心间延迟从核心 #3 到核心 #4 又多了一步.)

Obviously there are many other differences between Intel and AMD designs that affect inter-core latency, like Intel using a ring bus or mesh, and AMD using a crossbar / all-to-all design with small clusters. (e.g. Ryzen has clusters of 4 cores that share an L3. That's why the inter-core latency for Ryzen has another step from core #3 to core #4.)

顺便说一句,请注意,对于 Intel 和 AMD,同一物理内核上的两个逻辑内核之间的延迟要低得多.什么是延迟以及生产者-消费者在超级兄弟与非超级兄弟之间共享内存位置的吞吐量成本?.

BTW, notice that the latency between two logical cores on the same physical core is much lower for Intel and AMD. What are the latency and throughput costs of producer-consumer sharing of a memory location between hyper-siblings versus non-hyper siblings?.

我没有寻找任何在其他类似模型上模拟 MESI 与 MOESI 的学术论文.

I didn't look for any academic papers that simulated MESI vs. MOESI on an otherwise-similar model.

MESIF 与 MOESI 的选择会受到其他设计因素的影响;英特尔使用大型包含标签的 L3 共享缓存作为一致性流量的后备是他们解决 MOESI 解决的相同问题的解决方案:通过回写到 L3,然后将数据从 L3 发送到请求的内核,可以有效地处理内核之间的流量,如果核心在私有 L2 或 L1d 中具有修改状态的线路.

Choice of MESIF vs. MOESI can be influenced by other design factors; Intel's use of a large tag-inclusive L3 shared cache as a backstop for coherency traffic is their solution to the same problem that MOESI solves: traffic between cores is handled efficiently with write-back to L3 then sending the data from L3 to the requesting core, in the case where a core had the line in Modified state in a private L2 or L1d.

IIRC,一些 AMD 设计(如推土机系列的某些版本)没有所有内核共享的最后一级缓存,而是由成对的内核共享更大的 L2 缓存.不过,更高性能的 BD 系列 CPU 也确实有一个共享缓存,因此至少可以在 L3 中访问干净的数据.

IIRC, some AMD designs (like some versions of Bulldozer-family) didn't have a last-level cache shared by all cores, and instead had larger L2 caches shared by pairs of cores. Higher-performance BD-family CPUs did also have a shared cache, though, so at least clean data could hit in L3.

这篇关于MOESI 缓存一致性协议相对于 MESI 有什么好处?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆