在释放序列中使用原子读取-修改-写入操作 [英] Using an atomic read-modify-write operation in a release sequence

查看:72
本文介绍了在释放序列中使用原子读取-修改-写入操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说,我在线程#1中创建了一个 Foo 类型的对象,并希望能够在线程#3中对其进行访问.
我可以尝试以下方法:

  std :: atomic< int>同步{10};Foo * fp;//线程1:修改同步:10->11fp =新的Foo;sync.store(11,std :: memory_order_release);//线程2a:修改同步:11->12while(sync.load(std :: memory_order_relaxed)!= 11);sync.store(12,std :: memory_order_relaxed);//线程3while(sync.load(std :: memory_order_acquire)!= 12);fp-> do_something(); 

  • 线程#1中的存储/发布命令 Foo 更新为11
  • 线程#2a非原子地将 sync 的值增加到12
  • 仅在#3加载11时,才会在线程#1和#3之间建立
  • 与... 的同步关系

该方案被破坏了,因为线程#3旋转直到加载12,线程12可能会乱序到达(wrt 11),而 Foo 却没有与12一起排序(由于线程#中的宽松操作)2a).
这有点违反直觉,因为 sync 的修改顺序是10→11→12

该标准规定(第1.10.1-6节):

原子存储释放与负载获取同步,该负载获取从存储中获取其值(29.3).[注意:除非在特定情况下,否则读取稍后的值并不一定能确保可见性,如下所述.这样的要求有时会干扰有效的实施.—尾注]

(第1.10.1-5节)中还说:

以原子对象M上的释放操作A为首的释放序列是副作用的最大连续子序列,其副作用是M的修改顺序,其中第一个操作是A,随后的每个操作
-由执行A的同一线程执行,或者
-是原子的读-修改-写操作.

现在,将线程2a修改为使用原子级的read-modify-write操作:

 //线程2b:修改同步:11->12int val;while((val = 11)&&!sync.compare_exchange_weak(val,12,std :: memory_order_relaxed)); 

如果此释放顺序正确,则 Foo 在加载11或12时与线程#3同步.我对使用原子读取-修改-写入的问题是:

  • 具有线程#2b的方案是否构成正确的释放顺序?

如果是这样的话:

  • 确保此方案正确的读取-修改-写入操作的特定属性是什么?

解决方案

带有线程#2b的方案是否构成正确的释放顺序?

根据您对标准的引用,

.

确保此方案正确的读-修改-写操作的特定属性是什么?

好吧,圆形的答案是,唯一重要的特定属性是"C ++标准对此进行了定义".

实际上,有人可能会问为什么这样定义标准.我认为您不会发现答案具有深厚的理论基础:我认为委员会也可以对此进行定义,以使RMW 参与发布顺序,或者(也许(难度更大)进行了定义,以便Rstrong和RMW以及单独的mo_relaxed 加载和存储都参与发布序列,而不会影响模型的健全性".

对于为什么不选择后一种方法,他们已经给出了相关的表现:

这样的要求有时会干扰有效的实施.

尤其是,在允许对负载存储进行重新排序的任何硬件平台上,这都意味着即使 mo_relaxed 负载和/或存储也可能需要障碍!如今存在这样的平台.即使在顺序更严格的平台上,它也可能会抑制编译器的优化.

那么他们为什么不采取其他一致"的方法,而不要求RMW mo_relaxed 参与发布序列呢?可能是因为RMW操作的现有硬件实现提供了这样的保证,并且RMW操作的性质使其有可能在将来成为现实.特别是,正如Peter在上面的评论中指出的那样,即使在 mo_relaxed 的情况下,RMW操作在概念上和实践上也比单独的加载和存储要强 1 :如果它们进行操作,将毫无用处没有一致的总订单.

一旦您接受了硬件的工作原理,就可以从性能的角度来调整标准:如果不这样做,您将让人们使用限制性更强的顺序,例如 mo_acq_rel 为了获得发布顺序保证,但是在对CAS要求不高的真实硬件上,这不是免费提供的.


1 实际上"部分意味着即使最弱形式的RMW指令通常也是相对昂贵"的操作,在现代硬件上需要十几个周期或更多的周期,而 mo_relaxed 加载和存储通常只是编译为目标ISA中的普通加载和存储.

Say, I create an object of type Foo in thread #1 and want to be able to access it in thread #3.
I can try something like:

std::atomic<int> sync{10};
Foo *fp;

// thread 1: modifies sync: 10 -> 11
fp = new Foo;
sync.store(11, std::memory_order_release);

// thread 2a: modifies sync: 11 -> 12
while (sync.load(std::memory_order_relaxed) != 11);
sync.store(12, std::memory_order_relaxed);

// thread 3
while (sync.load(std::memory_order_acquire) != 12);
fp->do_something();

  • The store/release in thread #1 orders Foo with the update to 11
  • thread #2a non-atomically increments the value of sync to 12
  • the synchronizes-with relationship between thread #1 and #3 is only established when #3 loads 11

The scenario is broken because thread #3 spins until it loads 12, which may arrive out of order (wrt 11) and Foo is not ordered with 12 (due to the relaxed operations in thread #2a).
This is somewhat counter-intuitive since the modification order of sync is 10 → 11 → 12

The standard says (§ 1.10.1-6):

an atomic store-release synchronizes with a load-acquire that takes its value from the store (29.3). [ Note: Except in the specified cases, reading a later value does not necessarily ensure visibility as described below. Such a requirement would sometimes interfere with efficient implementation. —end note ]

It also says in (§ 1.10.1-5):

A release sequence headed by a release operation A on an atomic object M is a maximal contiguous subsequence of side effects in the modification order of M, where the first operation is A, and every subsequent operation
- is performed by the same thread that performed A, or
- is an atomic read-modify-write operation.

Now, thread #2a is modified to use an atomic read-modify-write operation:

// thread 2b: modifies sync: 11 -> 12
int val;
while ((val = 11) && !sync.compare_exchange_weak(val, 12, std::memory_order_relaxed));

If this release sequence is correct, Foo is synchronized with thread #3 when it loads either 11 or 12. My questions about the use of an atomic read-modify-write are:

  • Does the scenario with thread #2b constitute a correct release sequence ?

And if so:

  • What are the specific properties of a read-modify-write operation that ensure this scenario is correct ?

解决方案

Does the scenario with thread #2b constitute a correct release sequence ?

Yes, per your quote from the standard.

What are the specific properties of a read-modify-write operation that ensure this scenario is correct?

Well, the somewhat circular answer is that the only important specific property is that "The C++ standard defines it so".

As a practical matter, one may ask why the standard defines it like this. I don't think you'll find that the answer has a deep theoretical basis: I think the committee could have also defined it such that the RMW doesn't participate in the release sequence, or (perhaps with more difficulty) have defined so that both the RMW and the separate mo_relaxed load and store participate in the release sequence, without compromising the "soundness" of the model.

They already give a performance related as to why they didn't choose the latter approach:

Such a requirement would sometimes interfere with efficient implementation.

In particular, on any hardware platform that allowed load-store reordering, it would imply that even mo_relaxed loads and/or stores might require barriers! Such platforms exist today. Even on more strongly ordered platforms, it may inhibit compiler optimizations.

So why didn't they take then take other "consistent" approach of not requiring RMW mo_relaxed to participate in the release sequence? Probably because existing hardware implementations of RMW operations provide such guarantees and the nature of RMW operations makes it likely that this will be true in the future. In particular, as Peter points in the comments above, RMW operations, even with mo_relaxed are conceptually and practically1 stronger than separate loads and stores: they would be quite useless if they didn't have a consistent total order.

Once you accept that is how hardware works, it makes sense from a performance angle to align the standard: if you didn't, you'd have people using more restrictive orderings such as mo_acq_rel just to get the release sequence guarantees, but on real hardware that has weakly ordered CAS, this doesn't come for free.


1 The "practically" part means that even the weakest forms of RMW instructions are usually relatively "expensive" operations taking a dozen cycles or more on modern hardware, while mo_relaxed loads and stores generally just compile to plain loads and stores in the target ISA.

这篇关于在释放序列中使用原子读取-修改-写入操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆