MOESI缓存一致性协议相对于MESI有什么好处? [英] What is the benefit of the MOESI cache coherency protocol over MESI?

查看:1040
本文介绍了MOESI缓存一致性协议相对于MESI有什么好处?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道MOESI相对于MESI缓存一致性协议有什么好处,以及哪种协议目前最受现代体系结构的青睐.如果成本不允许,收益通常不会转化为实施. MOESI在MESI上的定量性能结果也很值得一看.

解决方案

AMD使用MOESI,英特尔使用MESIF. (我不知道非x86缓存的详细信息.)

MOESI 允许直接在缓存之间发送脏缓存行,而不是写回共享的外部缓存,然后从那里读取.链接的Wiki文章有更多详细信息,但基本上是关于共享脏数据的. O 拥有状态跟踪哪个高速缓存负责写回脏数据.

MESIF 允许缓存 F 发送干净的副本高速缓存行到另一个高速缓存,而不是其他高速缓存必须从内存中重新读取它以获得另一个共享副本. (英特尔自从Nehalem已经为所有内核使用了一个大型共享的L3缓存,因此无论如何,在检查内存之前,所有请求最终都会由一个L3缓存支持,但这是针对在一个插槽上的所有内核.直到Skylake-AVX512之前,大型共享L3缓存都是包含在内的.什么是延迟兄弟姐妹与非超级兄弟姐妹之间存储位置的生产者-消费者共享的存储成本和吞吐成本?.

我没有寻找任何在其他类似模型上模拟MESI与MOESI的学术论文.

MESIF与MOESI的选择可能会受到其他设计因素的影响;英特尔使用包含标签的大型L3共享高速缓存作为一致性流量的支持,是他们解决MOESI解决的同一问题的解决方案:内核之间的流量通过回写到L3进行有效处理,然后将数据从L3发送到请求的内核,如果核心在专用L2或L1d中的线路处于修改"状态.

IIRC,某些AMD设计(例如Bulldozer系列的某些版本)没有所有内核共享的最后一级缓存,而是由成对内核共享了较大的L2缓存.不过,性能更高的BD系列CPU确实也具有共享的缓存,因此至少干净的数据可能会在L3中命中.

I was wondering what benefits MOESI has over the MESI cache coherency protocol, and which protocol is currently favored for modern architectures. Oftentimes benefits don't translate to implementation if the costs don't allow it. Quantitative performance results of MOESI over MESI would be nice to see also.

解决方案

AMD uses MOESI, Intel uses MESIF. (I don't know about non-x86 cache details.)

MOESI allows sending dirty cache lines directly between caches instead of writing back to a shared outer cache and then reading from there. The linked wiki article has a bit more detail, but it's basically about sharing dirty data. The Owned state keeps track of which cache is responsible for writing back dirty the data.

MESIF allows caches to Forward a copy of a clean cache line to another cache, instead of other caches having to re-read it from memory to get another Shared copy. (Intel since Nehalem already uses a single large shared L3 cache for all cores, so all requests are ultimately backstopped by one L3 cache before checking memory anyway, but that's for all cores on one socket. Forwarding apply between sockets in a multi-socket system. Until Skylake-AVX512, the large shared L3 cache was inclusive. Which cache mapping technique is used in intel core i7 processor?)

Wikipedia's MESIF article (linked above) has some comparison between MOESI and MESIF.


AMD in some cases has lower latency for sharing the same cache line between 2 cores. For example, see this graph of inter-core latency for Ryzen vs. quad-core Intel vs. many-core Intel (ring bus: Broadwell) vs. Skylake-X (worst).

Obviously there are many other differences between Intel and AMD designs that affect inter-core latency, like Intel using a ring bus or mesh, and AMD using a crossbar / all-to-all design with small clusters. (e.g. Ryzen has clusters of 4 cores that share an L3. That's why the inter-core latency for Ryzen has another step from core #3 to core #4.)

BTW, notice that the latency between two logical cores on the same physical core is much lower for Intel and AMD. What are the latency and throughput costs of producer-consumer sharing of a memory location between hyper-siblings versus non-hyper siblings?.

I didn't look for any academic papers that simulated MESI vs. MOESI on an otherwise-similar model.

Choice of MESIF vs. MOESI can be influenced by other design factors; Intel's use of a large tag-inclusive L3 shared cache as a backstop for coherency traffic is their solution to the same problem that MOESI solves: traffic between cores is handled efficiently with write-back to L3 then sending the data from L3 to the requesting core, in the case where a core had the line in Modified state in a private L2 or L1d.

IIRC, some AMD designs (like some versions of Bulldozer-family) didn't have a last-level cache shared by all cores, and instead had larger L2 caches shared by pairs of cores. Higher-performance BD-family CPUs did also have a shared cache, though, so at least clean data could hit in L3.

这篇关于MOESI缓存一致性协议相对于MESI有什么好处?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆