高速缓存一致性文献通常只引用存储缓冲区,而不引用读取缓冲区。然而,两者都需要吗? [英] Cache coherence literature generally only refers store buffers but not read buffers. Yet one somehow needs both?

查看:120
本文介绍了高速缓存一致性文献通常只引用存储缓冲区,而不引用读取缓冲区。然而,两者都需要吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在阅读一致性模型(即x86 TSO)时,作者通常会使用模型,其中有大量CPU,其关联的存储缓冲区和私有缓存。

When reading about consistency models (namely the x86 TSO), authors in general resort to models where there are a bunch of CPUs, their associated store buffers and their private caches.

如果我的理解是正确的,则存储缓冲区可以描述为队列,CPU可以在该队列中将要提交的任何存储指令放入内存。顾名思义,它们是 store 缓冲区。

If my understanding is correct, store buffers can be described as queues where CPUs may put any store instruction they want to commit to memory. So as the name states, they are store buffers.

但是当我阅读这些论文时,他们倾向于谈论关于加载和存储的交互,带有诸如稍后的加载可以通过早期的存储之类的语句,这有点令人困惑,因为它们似乎在谈论好像存储缓冲区同时具有加载和存储的情况。 t-对吗?

But when I read those papers, they tend to talk about the interaction of loads and stores, with statements such as "a later load can pass an earlier store" which is slightly confusing, as they almost seem to be talking as if the store buffer would have both loads and stores, when it doesn't -- right?

因此,还必须有一个负载存储区,他们(至少是明确地)不在谈论它们。另外,这两个必须以某种方式同步,因此两者都知道何时可以从内存加载并提交到内存,或者我错过了什么?

So there must be also be a load store that they are not (at least explicitly) talking about. Plus, those two must be somehow synchronized, so both know when it's acceptable to load from memory and to commit to memory -- or am I missing something?

任何人都可以摆脱

编辑:

让我们看一下内存入门中的一段一致性和缓存一致性:

Let's look at a paragraph out of "A primer on memory consistency and cache coherence":


要了解TSO中原子RMW的实现,我们立即将
RMW视为负载其次是商店。由于TSO的订购规则,RMW的
负载部分无法通过较早的负载。
起初看起来可能是RMW的负载部分可以通过写入缓冲区中的早期
存储,但这是不合法的。如果RMW的
的负载部分通过了较早的商店,则RMW的商店部分
也必须通过较早的商店,因为RMW是原子对。
但由于不允许商店在TSO中相互传递,因此RMW的负载
部分也无法通过较早的商店。

To understand the implementation of atomic RMWs in TSO, we consider the RMW as a load immediately followed by a store. The load part of the RMW cannot pass earlier loads due to TSO’s ordering rules. It might at first appear that the load part of the RMW could pass earlier stores in the write buffer, but this is not legal. If the load part of the RMW passes an earlier store, then the store part of the RMW would also have to pass the earlier store because the RMW is an atomic pair. But because stores are not allowed to pass each other in TSO, the load part of the RMW cannot pass an earlier store either

更具体地说,


由于TSO的订购规则,RMW的负荷部分无法通过较早的负荷。乍看起来,RMW的负载部分可能会通过写缓冲区中的较早存储

The load part of the RMW cannot pass earlier loads due to TSO’s ordering rules. It might at first appear that the load part of the RMW could pass earlier stores in the write buffer

,因此它们指的是负载/存储缓冲区在写缓冲区中彼此交叉(我认为与存储缓冲区是同一件事?)

so they are referring to loads / stores crossing each other in the write buffer (which I assume is the same thing as the store buffer?)

谢谢

推荐答案

是的,写缓冲区=存储缓冲区。

Yes, write buffer = store buffer.

他们正在谈论是否将原子RMW拆分为一个原子RMW。

They're talking about if an atomic RMW was split up into a separate load and store, and the store buffer delayed another store (to a separate address) so it was after the load but still before the store.

显然,这样做会使它在加载之后但仍在存储之前。非原子的,并且违反了所有x86原子RMW操作也是完全障碍的要求。 ( lock 前缀也意味着这一点。)

Obviously that would make it non-atomic, and violate the requirement that all x86 atomic RMW operations are also full barriers. (The lock prefix implies that, too.)

通常,读者很难检测到这一点,但是,如果单独的地址与原子RMW相邻,则例如一个线程执行64位qword加载时,另一个线程可以观察到一个dword存储区+ dword RMW,这是一个原子操作。

Normally it would be hard for a reader to detect that, but if the "separate address" was contiguous with the atomic RMW, then e.g. a dword store + a dword RMW could be observed by another thread doing a 64-bit qword load of both as one atomic operation.

re:标题问题:

加载缓冲区不会引起重新排序。他们等待尚未到达的数据;

Load buffers don't cause reordering. They wait for data that hasn't arrived yet; the load finishes "executing" when it reads data.

存储缓冲区从根本上不同;加载缓冲区在读取数据时完成执行。

Store buffers are fundamentally different; they hold data for some time before it becomes globally visible.

x86的TSO内存模型可以描述为顺序一致性+存储缓冲区(带有存储转发)。另请参见 x86 mfence和C ++内存屏障,并对此进行评论回答更多有关以下事实的讨论:对于线程重新加载刚存储的数据(尤其是部分加载

x86's TSO memory model can be described as sequential-consistency + a store-buffer (with store-forwarding). See also x86 mfence and C++ memory barrier and comments on that answer for more discussion about the fact that merely allowing StoreLoad reordering is not a sufficient description for cases where a thread reloads data that it just stored, especially if a load partially overlaps with recent stores so the HW merges data from the store buffer with data from L1d to complete the load before the store is globally visible.

另外请注意,x86 CPU 推测性地可以对负载进行重新排序(至少是Intel的),但是要避免错误推测,以保留不进行LoadLoad或LoadStore重新排序的TSO内存模型。因此,CPU必须跟踪负载与存储顺序。英特尔将组合的存储+加载缓冲区跟踪结构称为内存顺序缓冲区(MOB)。 请参见在英特尔硬件上存储缓冲区?究竟什么是存储缓冲区?(更多)。

Also note that x86 CPUs speculatively do reorder loads (at least Intel's do), but shoot down the mis-speculation to preserve the TSO memory model of no LoadLoad or LoadStore reordering. CPUs thus have to track loads vs. store ordering. Intel calls the combined store+load buffer tracking structure the "memory order buffer" (MOB). See Size of store buffers on Intel hardware? What exactly is a store buffer? for more.

这篇关于高速缓存一致性文献通常只引用存储缓冲区,而不引用读取缓冲区。然而,两者都需要吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆