在ARM上加载和存储重新排序 [英] Loads and stores reordering on ARM

查看:112
本文介绍了在ARM上加载和存储重新排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不是ARM专家,但至少在某些ARM体系结构上,这些存储和装载不会受到重新排序吗?

I'm not an ARM expert but won't those stores and loads be subjected to reordering at least on some ARM architectures?

  atomic<int> atomic_var; 
  int nonAtomic_var;
  int nonAtomic_var2;

  void foo()
  {       
          atomic_var.store(111, memory_order_relaxed);
          atomic_var.store(222, memory_order_relaxed);
  }

  void bar()
  {       
          nonAtomic_var = atomic_var.load(memory_order_relaxed);
          nonAtomic_var2 = atomic_var.load(memory_order_relaxed);
  }

在使编译器在它们之间放置内存屏障方面,我一直没有成功.

I've had no success in making the compiler put memory barriers between them.

我已经尝试过以下操作(在x64上):

I've tried something like below (on x64):

$ arm-linux-gnueabi-g++ -mcpu=cortex-a9 -std=c++11 -S -O1 test.cpp

我有:

_Z3foov:
          .fnstart
  .LFB331:
          @ args = 0, pretend = 0, frame = 0
          @ frame_needed = 0, uses_anonymous_args = 0
          @ link register save eliminated.
          movw    r3, #:lower16:.LANCHOR0
          movt    r3, #:upper16:.LANCHOR0
          mov     r2, #111
          str     r2, [r3]
          mov     r2, #222
          str     r2, [r3]
          bx      lr
          ;...
  _Z3barv:
          .fnstart
  .LFB332:
          @ args = 0, pretend = 0, frame = 0
          @ frame_needed = 0, uses_anonymous_args = 0
          @ link register save eliminated.
          movw    r3, #:lower16:.LANCHOR0
          movt    r3, #:upper16:.LANCHOR0
          ldr     r2, [r3]
          str     r2, [r3, #4]
          ldr     r2, [r3]
          str     r2, [r3, #8]
          bx      lr

是否将装载和存储到从未在ARM上重新排序过的相同位置?我在ARM文档中找不到这样的限制.

Are loads and stores to the same location never reordered on ARM? I couldn't find such restriction in the ARM docs.

我要问的是c ++ 11标准,该标准指出:

I'm asking in regard to the c++11 standard which states that:

对任何特定原子变量的所有修改都以特定于该原子变量的总顺序发生.

All modifications to any particular atomic variable occur in a total order that is specific to this one atomic variable.

推荐答案

单个变量的总顺序 存在是因为缓存一致性(MESI):存储无法从存储缓冲区提交除非核心拥有对该高速缓存行的独占访问权,否则它将进入L1d高速缓存并对其他线程全局可见. (MESI排他或修改状态.)

The total order for a single variable exists because of cache coherency (MESI): a store can't commit from the store buffer into L1d cache and become globally visible to other threads unless the core owns exclusive access to that cache line. (MESI Exclusive or Modified state.)

由于所有常规ISA都具有一致的缓存(通常使用MESI的变体),因此C ++保证不需要在任何常规CPU体系结构上实现任何障碍.这就是为什么volatile恰好在主流C ++实现中充当mo_relaxed atomic的旧版/UB版本的原因(但通常不这样做).另请参见何时在多线程中使用volatile?有关更多详细信息.

That C++ guarantee doesn't require any barriers to implement on any normal CPU architecture because all normal ISAs have coherent caches, normally using a variant of MESI. This is why volatile happens to work as a legacy / UB version of mo_relaxed atomic on mainstream C++ implementations (but generally don't do it). See also When to use volatile with multi threading? for more details.

(某些系统存在共享内存的两种不同类型的CPU,例如微控制器+ DSP,但是C ++ std::thread不会跨不共享该内存的一致视图的内核启动线程.因此编译器只需要在相同的内部共享一致性域中为ARM内核进行代码生成即可.)

(Some systems exist with two different kinds of CPU that share memory, e.g. microcontroller + DSP, but C++ std::thread won't start threads across cores that don't share a coherent view of that memory. So compilers only have to do code-gen for ARM cores in the same inner-shared coherency domain.)

对于任何给定的原子对象,将始终存在所有线程进行修改的总顺序(由您引用的ISO C ++标准保证),但您事先不知道它将是什么,除非您可以在线程之间建立同步.

For any given atomic object, a total order of modification by all threads will always exist (as guaranteed by the ISO C++ standard you quoted), but you don't know ahead of time what it's going to be unless you establish synchronization between threads.

例如该程序的不同运行可能会先执行两个加载,或者先执行一个加载,然后再存储另一个加载.

e.g. different runs of this program could have both loads go first, or one load then both stores then the other load.

此总顺序(对于单个变量)将与每个线程的程序顺序兼容,但是是程序顺序的任意交织.

memory_order_relaxed仅对该变量执行原子操作,而不对wrt进行排序.还要别的吗. 在编译时唯一固定的顺序是wrt.该线程对相同原子变量的其他访问.

memory_order_relaxed only atomic operation on that variable, not ordering wrt. anything else. The only ordering that's fixed at compile time is wrt. other accesses to the same atomic variable by this thread.

不同的线程将同意 this 变量的修改顺序,但可能不同意所有对象的全局修改顺序. (ARMv8使ARM内存模型具有多重复制原子性,因此这是不可能的(并且可能没有真正的早期ARM违反该规则),但是POWER在现实生活中确实允许两个独立的读取器线程在另外2个独立的写入器上对存储的顺序存在分歧.称为IRIW重新排序.

Different threads will agree on the modification order for this variable, but might disagree on the global modification order for all objects. (ARMv8 made the ARM memory model multi-copy-atomic so this is impossible (and probably no real earlier ARM violated that), but POWER does in real life allow two independent reader threads to disagree on the order of stores by 2 other independent writer threads. This is called IRIW reordering. Will two atomic writes to different locations in different threads always be seen in the same order by other threads?)

当涉及多个变量时,IRIW重新排序是可能的(除其他事项外),为什么甚至需要说总修改顺序确实总是存在于每个单独的变量中.

The fact that IRIW reordering is a possibility when multiple variables are involved is (among other things) why it even needs to be said that a total modification order does always exist for each individual variable separately.

要存在一个全线程总订单,您需要所有原子访问才能使用seq_cst,这会涉及到障碍.但这当然还不能完全确定编译时的顺序.在不同的运行中使用不同的时间会导致获取负载,而无论是否看到某个商店.

For an all-thread total order to exist, you need all your atomic accesses to use seq_cst, which would involve barriers. But that still wouldn't of course fully determine at compile time what that order will be; different timings on different runs will lead to acquire loads seeing a certain store or not.

是否将加载并存储到从未在ARM上重新排序过的相同位置?

Are loads and stores to the same location never reordered on ARM?

在单线程内如果对一个内存位置进行了多次存储,则程序顺序中的最后一个将始终显示为其他线程的最后一个.即,一旦尘埃落定,存储位置将具有最后存储所存储的值.其他任何方法都可以打破程序顺序对线程重新加载其自己的存储区的幻想.

From within a single thread no. If you do multiple stores to a memory location, the last one in program order will always appear as the last to other threads. i.e. once the dust settles, the memory location will have the value stored by the last store. Anything else would break the illusion of program order for threads reloading their own stores.

C ++标准中的某些排序保证甚至被称为写-写一致性"和其他类型的一致性. ISO C ++不需要显式地要求一致性缓存(在ISA上可能需要显式刷新的实现),但是效率不高.

Some of the ordering guarantees in the C++ standard are even called "write-write coherency" and other kinds of coherency. ISO C++ doesn't explicitly require coherent caches (an implementation on an ISA that needs explicit flushing is possible), but would not be efficient.

http://eel.is/c++draft/intro.races #19

[注意:前面的四个一致性要求有效地禁止了编译器对单个对象进行原子操作重新排序,即使两个操作都是宽松的负载也是如此. 这有效地使大多数硬件提供的缓存一致性可用于C ++原子操作. —结束说明]

[ Note: The four preceding coherence requirements effectively disallow compiler reordering of atomic operations to a single object, even if both operations are relaxed loads. This effectively makes the cache coherence guarantee provided by most hardware available to C++ atomic operations. — end note ]


以上大部分内容与修改顺序有关,而不是与LoadLoad重新排序有关.


Most of the above is about modification order, not LoadLoad reordering.

那是另一回事. C ++保证了读-读一致性,即同一线程对同一原子对象的两次读取以程序相对于彼此的顺序进行.

That is a separate thing. C++ guarantees read-read coherence, i.e. that 2 reads of the same atomic object by the same thread happen in program order relative to each other.

http://eel.is/c++draft/intro.races #16

如果原子对象M的值计算A在M的值计算B之前发生,并且A从对M的副作用X中获取其值,则B所计算的值应为X所存储的值或由副作用Y在M上存储的值,其中,Y按照M的修改顺序跟随X . [注意:此要求称为 read-read coherence . —尾注]

If a value computation A of an atomic object M happens before a value computation B of M, and A takes its value from a side effect X on M, then the value computed by B shall either be the value stored by X or the value stored by a side effect Y on M, where Y follows X in the modification order of M. [ Note: This requirement is known as read-read coherence. — end note ]

值计算"是变量的读取负载.突出显示的短语是保证同一线程中的后续读取不会观察到来自其他线程的较早写入(比其已看到的写入更早)的部分.

A "value computation" is a read aka load of a variable. The highlighted phrase is the part that guarantees that later reads in the same thread can't observe earlier writes from other threads (earlier than a write they already saw).

那是我链接的上一个引号所讨论的四个条件之一.

That's one of the 4 conditions that the previous quote I linked was talking about.

编译器将其编译为两个简单的ARM负载这一事实足以证明ARM ISA也对此提供了保证.(因为我们确信ISO C ++会要求这样做.)

The fact that compilers compile it to two plain ARM loads is proof enough that the ARM ISA also guarantees this. (Because we know for sure that ISO C++ requires it.)

我对ARM手册不熟悉,但是大概在那儿.

I'm not familiar with ARM manuals but presumably it's in there somewhere.

另请参见 ARM和POWER教程简介宽松的内存模型-一篇论文,详细介绍了/不允许在各种测试用例中进行哪些重新排序.

See also A Tutorial Introduction to the ARM and POWER Relaxed Memory Models - a paper that goes into significant detail about what reorderings are/aren't allowed for various test cases.

这篇关于在ARM上加载和存储重新排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆