atomic_thread_fence(memory_order_seq_cst)是否具有完整内存屏障的语义? [英] Does atomic_thread_fence(memory_order_seq_cst) have the semantics of a full memory barrier?

查看:800
本文介绍了atomic_thread_fence(memory_order_seq_cst)是否具有完整内存屏障的语义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

完全/通用内存屏障是指在屏障之前指定的所有LOAD和STORE操作都将出现在相对于系统其他组件在屏障之后指定的所有LOAD和STORE操作之前。



根据 cppreference memory_order_seq_cst 等于 memory_order_acq_rel 加上所有这样标记的所有操作的单个总修改顺序。但是据我所知,既没有获取也没有释放栅栏在C ++ 11强制执行#StoreLoad(加载后存储)排序。释放栅栏要求先前的读/写操作不能与任何后续写操作重新排序;获取栅栏要求没有后续的读/写可以与任何先前的读取重新排序。请更正我,如果我错了)



举个例子,

  atomic< int> X; 
atomic< int> y;

y.store(1,memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)
x.load(memory_order_relaxed); //(3)

优化编译器允许重新排序指令1),使其有效看起来像:

  x.load(memory_order_relaxed); //(3)
y.store(1,memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)

如果这是有效的转换,那么它证明 atomic_thread_fence(memory_order_seq_cst)不一定包含完整障碍的语义。

解决方案




  • x86_64: atomic_thread_fence(memory_order_seq_cst) MFENCE

  • PowerPC: hwsync

  • Itanuim: mf

  • ARMv7 / ARMv8: dmb ish li>
  • MIPS64:同步



主要的东西:观察线程可以简单地观察不同的顺序,并不会影响你在观察线程中使用什么栅栏。


优化编译器允许将指令(3)重新排序为
before(1)?


不,不允许。但是在全局可见的多线程程序中,这是真的,只有当:




  • 其他线程使用相同的 memory_order_seq_cst 用于使用这些值

  • 进行原子读/写操作,或者其他线程使用相同的 atomic_thread_fence(memory_order_seq_cst);










工作草案,编程语言标准C ++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf



< blockquote>

§29.3订单和一致性



§29.3 / 8



:memory_order_seq_cst确保仅针对没有数据竞争的
程序的顺序一致性,并且仅使用
memory_order_seq_cst操作。任何使用较弱的排序将
无效此保证,除非使用极端谨慎。特别是,
memory_order_seq_cst fences确保只有栅栏
本身的总订单。 通常情况下,栅栏不能用于对具有较弱订购规范的原子操作恢复顺序
一致性

- end note]







汇编器



  atomic< int> x,y 

y.store(1,memory_order_relaxed); //(1)
atomic_thread_fence(memory_order_seq_cst); //(2)
x.load(memory_order_relaxed); //(3)

此代码不是但是这个代码在STORE& CORE之间产生相同的指令。 LOAD,以及如果LOAD和STORE都使用 memory_order_seq_cst - 这是阻止StoreLoad重新排序的顺序一致性, Case-2

  atomic< int> x,y; 

y.store(1,memory_order_seq_cst); //(1)

x.load(memory_order_seq_cst); //(3)

有些笔记:


  1. 它可能会添加重复的指令(如MIPS64的以下示例)

  2. 或可能以其他指令的形式使用类似的操作:







ARMv8-A指南



表13.1。屏障参数



ISH 任何 - 任何



- 任何这意味着装载和存储必须在
之前完成屏障。


阻止
程序顺序中的屏障之后出现的加载和存储必须等待完成。两个指令可以通过这两个指令之间的附加指令来完成。并且我们看到第一个STORE(seq_cst)和下一个LOAD(seq_cst)之间的生成指令与FENCE(seq_cst) atomic_thread_fence(memory_order_seq_cst)



映射C / C ++ 11 memory_order_seq_cst 到differenct CPU架构: load() store() atomic_thread_fence()

注意 atomic_thread_fence(memory_order_seq_cst); 始终生成完全屏障:





描述了C / C ++ 11语义到差异CPU架构的所有映射:load(),store(),atomic_thread_fence(): http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html p>

因为顺序一致性会阻止StoreLoad重新排序,因为顺序一致性( store(memory_order_seq_cst)和next load(memory_order_seq_cst))之间生成的指令与 atomic_thread_fence(memory_order_seq_cst) atomic_thread_fence memory_order_seq_cst)阻止StoreLoad重新排序。


A full/general memory barrier is one where all the LOAD and STORE operations specified before the barrier will appear to happen before all the LOAD and STORE operations specified after the barrier with respect to the other components of the system.

According to cppreference, memory_order_seq_cst is equal to memory_order_acq_rel plus a single total modification order on all operations so tagged. But as far as I know, neither acquire nor release fence in C++11 enforces a #StoreLoad (load after store) ordering. A release fence requires that no previous read/write can be reordered with any following write; An acquire fence requires that no following read/write can be reordered with any previous read. Please correct me if I am wrong;)

Giving an example,

atomic<int> x;
atomic<int> y;

y.store(1, memory_order_relaxed);            //(1)
atomic_thread_fence(memory_order_seq_cst);   //(2)
x.load(memory_order_relaxed);                //(3)

Is it allowed by a optimizing compiler to reorder instruction (3) to before (1) so that it effective looks like:

x.load(memory_order_relaxed);                //(3)
y.store(1, memory_order_relaxed);            //(1)
atomic_thread_fence(memory_order_seq_cst);   //(2)

If this is a valid tranformation, then it proves that atomic_thread_fence(memory_order_seq_cst) doesn't not necessarily encompass the semantics of what a full barrier has.

解决方案

atomic_thread_fence(memory_order_seq_cst) always generates a full-barrier.

  • x86_64: MFENCE
  • PowerPC: hwsync
  • Itanuim: mf
  • ARMv7 / ARMv8: dmb ish
  • MIPS64: sync

The main thing: observing thread can simply observe in a different order, and will not matter what fences you are using in the observed thread.

Is it allowed by a optimizing compiler to reorder instruction (3) to before (1)?

Not, it isn't allowed. But in globally visible for multithreading programm this is true, only if:

  • other threads use the same memory_order_seq_cst for atomically read/write-operations with these values
  • or if other threads use the same atomic_thread_fence(memory_order_seq_cst); between load() and store() too - but this approach doesn't guarantee sequential consistency in general, because sequential consistency is more strong guarantee

Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf

§ 29.3 Order and consistency

§ 29.3 / 8

[ Note: memory_order_seq_cst ensures sequential consistency only for a program that is free of data races and uses exclusively memory_order_seq_cst operations. Any use of weaker ordering will invalidate this guarantee unless extreme care is used. In particular, memory_order_seq_cst fences ensure a total order only for the fences themselves. Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering specifications. — end note ]


How it can be mapped to assembler:

Case-1:

atomic<int> x, y

y.store(1, memory_order_relaxed);            //(1)
atomic_thread_fence(memory_order_seq_cst);   //(2)
x.load(memory_order_relaxed);                //(3)

This code isn't always equivalent to the meaning of Case-2, but this code produce the same instructions between STORE & LOAD, as well as if both LOAD and STORE uses memory_order_seq_cst - this is Sequential Consistency which prevents StoreLoad-reordering, Case-2:

atomic<int> x, y;

y.store(1, memory_order_seq_cst);            //(1)

x.load(memory_order_seq_cst);                //(3)

With some notes:

  1. it may add duplicate instructions (as in the following example for MIPS64)
  2. or may use similar operations in the form of other instructions:

Guide for ARMv8-A

Table 13.1. Barrier parameters

ISH Any - Any

Any - Any This means that both loads and stores must complete before the barrier. Both loads and stores that appear after the barrier in program order must wait for the barrier to complete.

Prevent reordering of two instructions can be done by additional instructions between these two. And as we see the first STORE(seq_cst) and next LOAD(seq_cst) generate instructions between its are the same as FENCE(seq_cst) (atomic_thread_fence(memory_order_seq_cst))

Mapping of C/C++11 memory_order_seq_cst to differenct CPU architectures for: load(), store(), atomic_thread_fence():

Note atomic_thread_fence(memory_order_seq_cst); always generates Full-barrier:

  • x86_64: STORE-MOV (into memory),MFENCE, LOAD-MOV (from memory), fence-MFENCE

  • x86_64-alt: STORE-MOV (into memory), LOAD-MFENCE,MOV (from memory), fence-MFENCE

  • x86_64-alt3: STORE-(LOCK) XCHG, LOAD-MOV (from memory), fence-MFENCE - full barrier

  • x86_64-alt4: STORE-MOV (into memory), LOAD-LOCK XADD(0), fence-MFENCE - full barrier

  • PowerPC: STORE-hwsync; st, LOAD-hwsync;ld; cmp; bc; isync, fence-hwsync

  • Itanium: STORE-st.rel;mf, LOAD-ld.acq, fence-mf

  • ARMv7: STORE-dmb ish; str;dmb ish, LOAD-ldr; dmb ish, fence-dmb ish

  • ARMv7-alt: STORE-dmb ish; str, LOAD-dmb ish;ldr; dmb ish, fence-dmb ish

  • ARMv8(AArch32): STORE-STL, LOAD-LDA, fence-DMB ISH - full barrier

  • ARMv8(AArch64): STORE-STLR, LOAD-LDAR, fence-DMB ISH - full barrier

  • MIPS64: STORE-sync; sw;sync;, LOAD-sync; lw; sync;, fence-sync

There are described all mapping of C/C++11 semantics to differenct CPU architectures for: load(), store(), atomic_thread_fence(): http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

Because Sequential-Consistency prevents StoreLoad-reordering, and because Sequential-Consistency (store(memory_order_seq_cst) and next load(memory_order_seq_cst)) generates instructions between its are the same as atomic_thread_fence(memory_order_seq_cst), then atomic_thread_fence(memory_order_seq_cst) prevents StoreLoad-reordering.

这篇关于atomic_thread_fence(memory_order_seq_cst)是否具有完整内存屏障的语义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆