x86体系结构上的内存排序限制 [英] Memory ordering restrictions on x86 architecture
问题描述
安东尼·威廉姆斯(Anthony Williams)在他的著作《行动中的C ++并发性》中写了以下内容(第309页):
例如,在x86和x86-64体系结构上,原子加载操作是 无论标记为memory_order_relaxed还是memory_order_seq_cst,始终相同 (请参阅第5.3.3节).这意味着使用宽松的内存顺序编写的代码可能会 在具有x86架构的系统上工作,而在具有更好的架构的系统上可能会失败, 细粒度的内存排序指令集,例如SPARC.
我是否正确理解在x86架构上所有原子加载操作都是memory_order_seq_cst
?另外,在 cppreference std::memory_order
网站上,提到在x86版本上,自动获取订单.
如果此限制有效,那么排序是否仍适用于编译器优化?
是的,排序仍然适用于编译器优化.
此外,在x86上原子加载操作始终相同"并不是完全准确.
在x86上,使用mov
完成的所有负载都具有语义,而使用mov
完成的所有存储都具有发布语义.因此,acq_rel,acq和relief负载是简单的mov
,并且类似acq_rel,rel和relaxed存储(acq商店和rel负载始终等于relaxed).
但是,这对于seq_cst不一定是正确的:体系结构并不能保证mov
的seq_cst语义.实际上,x86指令集没有用于顺序一致的加载和存储的任何特定指令. x86上只有原子的读取-修改-写入操作将具有seq_cst语义.因此,您可以通过执行参数为0的fetch_and_add操作(lock xadd
指令)获得负载的seq_cst语义,并通过执行seq_cst交换操作(xchg
指令)并丢弃先前的值来获得存储的seq_cst语义. /p>
但是您无需同时执行这两项操作!只要所有seq_cst存储都使用xchg
完成,就可以使用mov
简单地实现seq_cst加载.双重地,如果所有加载都由lock xadd
完成,则seq_cst存储可以简单地由mov
实现.
xchg
和lock xadd
比mov
慢得多.因为程序通常具有比存储更多的负载,所以使用xchg
进行seq_cst存储很方便,因此(更频繁的)seq_cst负载可以简单地使用mov
.此实现细节在x86应用程序二进制接口(ABI)中进行了整理.在x86上,兼容的编译器必须将seq_cst存储库编译为xchg
,以便可以使用更快的mov
指令完成seq_cst加载(可能会出现在另一个翻译单元中,并使用其他编译器进行编译).
因此,在x86上使用相同的指令完成seq_cst和获取负载通常是不正确的.这是真的,因为ABI指定seq_cst存储区被编译为xchg
.
In his great book 'C++ Concurrency in Action' Anthony Williams writes the following (page 309):
For example, on x86 and x86-64 architectures, atomic load operations are always the same, whether tagged memory_order_relaxed or memory_order_seq_cst (see section 5.3.3). This means that code written using relaxed memory ordering may work on systems with an x86 architecture, where it would fail on a system with a finer- grained set of memory-ordering instructions such as SPARC.
Do I get this right that on x86 architecture all atomic load operations are memory_order_seq_cst
? In addition, on the cppreference std::memory_order
site is mentioned that on x86 release-aquire ordering is automatic.
If this restriction is valid, do the orderings still apply to compiler optimizations?
Yes, ordering still applies to compiler optimizations.
Also, it is not entirely exact that on x86 "atomic load operations are always the same".
On x86, all loads done with mov
have acquire semantics and all stores done with mov
have release semantics. So acq_rel, acq and relaxed loads are simple mov
s, and similarly acq_rel, rel and relaxed stores (acq stores and rel loads are always equal to relaxed).
This however is not necessarily true for seq_cst: the architecture does not guarantee seq_cst semantics for mov
. In fact, the x86 instruction set does not have any specific instruction for sequentially consistent loads and stores. Only atomic read-modify-write operations on x86 will have seq_cst semantics. Hence, you could get seq_cst semantics for loads by doing a fetch_and_add operation (lock xadd
instruction) with an argument of 0, and seq_cst semantics for stores by doing a seq_cst exchange operation (xchg
instruction) and discarding the previous value.
But you do not need to do both! As long as all seq_cst stores are done with xchg
, seq_cst loads can be implemented simply with a mov
. Dually, if all loads were done with lock xadd
, seq_cst stores could be implemented simply with a mov
.
xchg
and lock xadd
are much slower than mov
. Because a program has (usually) more loads than stores, it is convenient to do seq_cst stores with xchg
so that the (more frequent) seq_cst loads can simply use a mov
. This implementation detail is codified in the x86 Application Binary Interface (ABI). On x86, a compliant compiler must compile seq_cst stores to xchg
so that seq_cst loads (which may appear in another translation unit, compiled with a different compiler) can be done with the faster mov
instruction.
Thus it is not true in general that seq_cst and acquire loads are done with the same instruction on x86. It is only true because the ABI specifies that seq_cst stores be compiled to an xchg
.
这篇关于x86体系结构上的内存排序限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!