atomic_thread_fence(memory_order_seq_cst)是否具有完整内存屏障的语义? [英] Does atomic_thread_fence(memory_order_seq_cst) have the semantics of a full memory barrier?
问题描述
完全/通用内存屏障是指在屏障之前指定的所有LOAD和STORE操作都将出现在相对于系统其他组件在屏障之后指定的所有LOAD和STORE操作之前。
根据 cppreference , memory_order_seq_cst
等于 memory_order_acq_rel
加上所有这样标记的所有操作的单个总修改顺序。但是据我所知,既没有获取也没有释放栅栏在C ++ 11强制执行#StoreLoad(加载后存储)排序。释放栅栏要求先前的读/写操作不能与任何后续写操作重新排序;获取栅栏要求没有后续的读/写可以与任何先前的读取重新排序。请更正我,如果我错了)
举个例子,
atomic< int> X;
atomic< int> y;
y.store(1,memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)
x.load(memory_order_relaxed); //(3)
优化编译器允许重新排序指令1),使其有效看起来像:
x.load(memory_order_relaxed); //(3)
y.store(1,memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)
如果这是有效的转换,那么它证明 atomic_thread_fence(memory_order_seq_cst)
不一定包含完整障碍的语义。
- x86_64:
atomic_thread_fence(memory_order_seq_cst)
MFENCE
- PowerPC:
hwsync
- Itanuim:
mf
- ARMv7 / ARMv8:
dmb ish
li>
- MIPS64:
同步
主要的东西:观察线程可以简单地观察不同的顺序,并不会影响你在观察线程中使用什么栅栏。
优化编译器允许将指令(3)重新排序为
before(1)?
不,不允许。但是在全局可见的多线程程序中,这是真的,只有当:
- 其他线程使用相同的
memory_order_seq_cst
用于使用这些值
- 进行原子读/写操作,或者其他线程使用相同的
atomic_thread_fence(memory_order_seq_cst);
工作草案,编程语言标准C ++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf
< blockquote>
§29.3订单和一致性
§29.3 / 8
:memory_order_seq_cst确保仅针对没有数据竞争的
程序的顺序一致性,并且仅使用
memory_order_seq_cst操作。任何使用较弱的排序将
无效此保证,除非使用极端谨慎。特别是,
memory_order_seq_cst fences确保只有栅栏
本身的总订单。 通常情况下,栅栏不能用于对具有较弱订购规范的原子操作恢复顺序
一致性。
- end note]
汇编器:
atomic< int> x,y
y.store(1,memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)
x.load(memory_order_relaxed); //(3)
此代码不是但是这个代码在STORE& CORE之间产生相同的指令。 LOAD,以及如果LOAD和STORE都使用 memory_order_seq_cst
- 这是阻止StoreLoad重新排序的顺序一致性, Case-2 :
atomic< int> x,y;
y.store(1,memory_order_seq_cst); //(1)
x.load(memory_order_seq_cst); //(3)
有些笔记:
- 它可能会添加重复的指令(如MIPS64的以下示例)
-
或可能以其他指令的形式使用类似的操作:
- 与x86_64,
LOCK
-prefix flushes的替代3/4映射Store-Buffer完全符合MFENCE
以防止StoreLoad重新排序 - 或ARMv8-我们知道
DMB ISH
是全屏障,阻止StoreLoad重新排序: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CHDGACJD.html
- 与x86_64,
ARMv8-A指南
表13.1。屏障参数
ISH
任何 - 任何
- 任何这意味着装载和存储必须在
之前完成屏障。
阻止
程序顺序中的屏障之后出现的加载和存储必须等待完成。两个指令可以通过这两个指令之间的附加指令来完成。并且我们看到第一个STORE(seq_cst)和下一个LOAD(seq_cst)之间的生成指令与FENCE(seq_cst)( atomic_thread_fence(memory_order_seq_cst)
)
映射C / C ++ 11 memory_order_seq_cst
到differenct CPU架构: load()
, store()
, atomic_thread_fence()
:
注意 atomic_thread_fence(memory_order_seq_cst);
始终生成完全屏障:
-
x86_64:商店 -
MOV(进入记忆),
MFENCE
,LOAD -MOV(从记忆)
,围栏 -MFENCE
-
x86_64-alt:STORE -
MOV(进入记忆体)
-MFENCE
,MOV(从记忆中)
,fence -MFENCE
-
x86_64-alt3:STORE-
(LOCK)XCHG
,LOAD -MOV
MFENCE
- 全屏障 -
x86_64-alt4:STORE -
MOV(into memory)
,LOAD -LOCK XADD(0)
,fence -MFENCE
- 完全屏障 -
PowerPC: STORE-
hwsync; st
,LOAD-hwsync;
ld; cmp;公元前; isync
,fence -hwsync
-
Itanium:STORE -
st.rel;
mf
,LOAD -ld。 acq
,fence -mf
-
ARMv7: STORE -
dmb ish; str;
dmb ish
,LOAD -ldr; dmb ish
,fence -dmb ish
-
ARMv7-alt:STORE -
dmb ish; str
,LOAD-dmb ish;
ldr; armv8(AArch32):dmb ish
,fence -dmb ish
-
STORE -
STL
,LOAD -LDA
,fence -DMB ISH
- 全屏障 -
ARMv8(AArch64): STORE -
STLR
,LOAD -LDAR
,fence -DMB ISH
- 全屏障 -
MIPS64: STORE-
同步; sw;
sync;
,LOAD-sync; lw; sync;
,fence -sync
描述了C / C ++ 11语义到差异CPU架构的所有映射:load(),store(),atomic_thread_fence(): http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html p>
因为顺序一致性会阻止StoreLoad重新排序,因为顺序一致性( store(memory_order_seq_cst)
和next load(memory_order_seq_cst)
)之间生成的指令与 atomic_thread_fence(memory_order_seq_cst)
, atomic_thread_fence memory_order_seq_cst)
阻止StoreLoad重新排序。
A full/general memory barrier is one where all the LOAD and STORE operations specified before the barrier will appear to happen before all the LOAD and STORE operations specified after the barrier with respect to the other components of the system.
According to cppreference, memory_order_seq_cst
is equal to memory_order_acq_rel
plus a single total modification order on all operations so tagged. But as far as I know, neither acquire nor release fence in C++11 enforces a #StoreLoad (load after store) ordering. A release fence requires that no previous read/write can be reordered with any following write; An acquire fence requires that no following read/write can be reordered with any previous read. Please correct me if I am wrong;)
Giving an example,
atomic<int> x;
atomic<int> y;
y.store(1, memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)
x.load(memory_order_relaxed); //(3)
Is it allowed by a optimizing compiler to reorder instruction (3) to before (1) so that it effective looks like:
x.load(memory_order_relaxed); //(3)
y.store(1, memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)
If this is a valid tranformation, then it proves that atomic_thread_fence(memory_order_seq_cst)
doesn't not necessarily encompass the semantics of what a full barrier has.
atomic_thread_fence(memory_order_seq_cst)
always generates a full-barrier.
- x86_64:
MFENCE
- PowerPC:
hwsync
- Itanuim:
mf
- ARMv7 / ARMv8:
dmb ish
- MIPS64:
sync
The main thing: observing thread can simply observe in a different order, and will not matter what fences you are using in the observed thread.
Is it allowed by a optimizing compiler to reorder instruction (3) to before (1)?
Not, it isn't allowed. But in globally visible for multithreading programm this is true, only if:
- other threads use the same
memory_order_seq_cst
for atomically read/write-operations with these values - or if other threads use the same
atomic_thread_fence(memory_order_seq_cst);
between load() and store() too - but this approach doesn't guarantee sequential consistency in general, because sequential consistency is more strong guarantee
Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf
§ 29.3 Order and consistency
§ 29.3 / 8
[ Note: memory_order_seq_cst ensures sequential consistency only for a program that is free of data races and uses exclusively memory_order_seq_cst operations. Any use of weaker ordering will invalidate this guarantee unless extreme care is used. In particular, memory_order_seq_cst fences ensure a total order only for the fences themselves. Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering specifications. — end note ]
How it can be mapped to assembler:
Case-1:
atomic<int> x, y
y.store(1, memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)
x.load(memory_order_relaxed); //(3)
This code isn't always equivalent to the meaning of Case-2, but this code produce the same instructions between STORE & LOAD, as well as if both LOAD and STORE uses memory_order_seq_cst
- this is Sequential Consistency which prevents StoreLoad-reordering, Case-2:
atomic<int> x, y;
y.store(1, memory_order_seq_cst); //(1)
x.load(memory_order_seq_cst); //(3)
With some notes:
- it may add duplicate instructions (as in the following example for MIPS64)
or may use similar operations in the form of other instructions:
- as in alternative-3/4 mappings for x86_64,
LOCK
-prefix flushes Store-Buffer exactly asMFENCE
to prevent StoreLoad-reordering - or ARMv8 - we known, that
DMB ISH
are full-barrier which prevents StoreLoad-reordering: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CHDGACJD.html
- as in alternative-3/4 mappings for x86_64,
Guide for ARMv8-A
Table 13.1. Barrier parameters
ISH
Any - AnyAny - Any This means that both loads and stores must complete before the barrier. Both loads and stores that appear after the barrier in program order must wait for the barrier to complete.
Prevent reordering of two instructions can be done by additional instructions between these two. And as we see the first STORE(seq_cst) and next LOAD(seq_cst) generate instructions between its are the same as FENCE(seq_cst) (atomic_thread_fence(memory_order_seq_cst)
)
Mapping of C/C++11 memory_order_seq_cst
to differenct CPU architectures for: load()
, store()
, atomic_thread_fence()
:
Note atomic_thread_fence(memory_order_seq_cst);
always generates Full-barrier:
x86_64: STORE-
MOV (into memory),
MFENCE
, LOAD-MOV (from memory)
, fence-MFENCE
x86_64-alt: STORE-
MOV (into memory)
, LOAD-MFENCE
,MOV (from memory)
, fence-MFENCE
x86_64-alt3: STORE-
(LOCK) XCHG
, LOAD-MOV (from memory)
, fence-MFENCE
- full barrierx86_64-alt4: STORE-
MOV (into memory)
, LOAD-LOCK XADD(0)
, fence-MFENCE
- full barrierPowerPC: STORE-
hwsync; st
, LOAD-hwsync;
ld; cmp; bc; isync
, fence-hwsync
Itanium: STORE-
st.rel;
mf
, LOAD-ld.acq
, fence-mf
ARMv7: STORE-
dmb ish; str;
dmb ish
, LOAD-ldr; dmb ish
, fence-dmb ish
ARMv7-alt: STORE-
dmb ish; str
, LOAD-dmb ish;
ldr; dmb ish
, fence-dmb ish
ARMv8(AArch32): STORE-
STL
, LOAD-LDA
, fence-DMB ISH
- full barrierARMv8(AArch64): STORE-
STLR
, LOAD-LDAR
, fence-DMB ISH
- full barrierMIPS64: STORE-
sync; sw;
sync;
, LOAD-sync; lw; sync;
, fence-sync
There are described all mapping of C/C++11 semantics to differenct CPU architectures for: load(), store(), atomic_thread_fence(): http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
Because Sequential-Consistency prevents StoreLoad-reordering, and because Sequential-Consistency (store(memory_order_seq_cst)
and next load(memory_order_seq_cst)
) generates instructions between its are the same as atomic_thread_fence(memory_order_seq_cst)
, then atomic_thread_fence(memory_order_seq_cst)
prevents StoreLoad-reordering.
这篇关于atomic_thread_fence(memory_order_seq_cst)是否具有完整内存屏障的语义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!