可以挥发但不受限制的读取产生无限期的值? (在真实硬件上) [英] Can volatile but unfenced reads yield indefinitely stale values? (on real hardware)

查看:138
本文介绍了可以挥发但不受限制的读取产生无限期的值? (在真实硬件上)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在回答此问题时有关OP的情况的另一个问题,我不确定:它主要是一个处理器架构问题,但有一个关于C ++ 11内存模型的一个连续的问题。

In answering this question a further question about the OP's situation came up that I was unsure about: it's mostly a processor architecture question, but with a knock-on question about the C++ 11 memory model as well.

基本上,OP的代码在更高的优化级别无限循环,因为下面的代码(为简单起见稍作修改):

Basically, the OP's code was looping infinitely at higher optimization levels because of the following code (slightly modified for simplicity):

while (true) {
    uint8_t ov = bits_; // bits_ is some "uint8_t" non-local variable
    if (ov & MASK) {
        continue;
    }
    if (ov == __sync_val_compare_and_swap(&bits_, ov, ov | MASK)) {
        break;
    }
}

其中 __ sync_val_compare_and_swap / code>是GCC的原子CAS内置。在 bits_& bit的情况下,GCC(合理地)将其优化为无限循环。在进入循环之前,检测到掩码 c> true c>完全跳过CAS操作,因此我建议进行以下更改(可行):



where __sync_val_compare_and_swap() is GCC's atomic CAS built-in. GCC (reasonably) optimized this into an infinite loop in the case that bits_ & mask was detected to be true before entering the loop, skipping the CAS operation entirely, so I suggested the following change (which works):

while (true) {
    uint8_t ov = bits_; // bits_ is some "uint8_t" non-local variable
    if (ov & MASK) {
        __sync_synchronize();
        continue;
    }
    if (ov == __sync_val_compare_and_swap(&bits_, ov, ov | MASK)) {
        break;
    }
}

我回答后,OP注意到更改 bits _ volatile uint8_t 似乎也工作。我建议不要走这条路线,因为 volatile 通常不应该用于同步,并且似乎没有什么不利,使用栅栏这里反正。

After I answered, OP noted that changing bits_ to volatile uint8_t seems to work as well. I suggested not to go that route, since volatile should not normally be used for synchronization, and there doesn't seem to be much downside to using a fence here anyway.

然而,我想到了更多,在这种情况下语义是这样的,如果 ov& MASK 检查是基于一个陈旧的值,只要它不是基于无限期过期的(即只要循环最终被破坏),因为实际更新 bits _ 已同步。因此, volatile 足够保证这个循环终止,如果 bits _ 由另一个线程更新, $ c> bits_& MASK == false ,对于任何现有的处理器?换句话说,在没有显式存储器栅栏的情况下,实际上可能由编译器优化的读取由处理器有效地优化,无限期地? ( EDIT:清楚的是,我想问一下现代硬件实际上可能做什么,假定读取是由编译器在循环中发出的,所以从技术上讲它不是一个语言问题,在C ++语义方面是方便的。)

However, I thought about it more, and in this case the semantics are such that it doesn't really matter if the ov & MASK check is based on a stale value, as long as it's not based on an indefinitely stale one (i.e. as long as the loop is broken eventually), since the actual attempt to update bits_ is synchronized. So is volatile enough here to guarantee that this loop terminates eventually if bits_ is updated by another thread such that bits_ & MASK == false, for any existent processor? In other words, in the absence of an explicit memory fence, is it practically possible for reads not optimized out by the compiler to be effectively optimized out by the processor instead, indefinitely? ( To be clear, I'm asking here about what modern hardware might actually do given the assumption that reads are emitted in a loop by the compiler, so it's not technically a language question although expressing it in terms of C++ semantics is convenient.)

这是硬件的角度,但是它稍微更新,并使它也是关于C ++ 11内存的一个可回答的问题模型,考虑以上代码的以下变化:

That's the hardware angle to it, but to update it slightly and make it also an answerable question about the C++11 memory model as well, consider the following variation to the code above:

// bits_ is "std::atomic<unsigned char>"
unsigned char ov = bits_.load(std::memory_order_relaxed);
while (true) {
    if (ov & MASK) {
        ov = bits_.load(std::memory_order_relaxed);
        continue;
    }
    // compare_exchange_weak also updates ov if the exchange fails
    if (bits_.compare_exchange_weak(ov, ov | MASK, std::memory_order_acq_rel)) {
        break;
    }
}

cppreference 声称 std :: memory_order_relaxed 意味着对原子变量周围的内存访问重新排序没有约束,因此独立于实际的硬件将不会或不会做,意味着 bits_.load(std :: memory_order_relaxed)可以技术上从不 bits _ 之后的更新值在符合实现的另一个线程上更新?

cppreference claims that std::memory_order_relaxed implies "no constraints on reordering of memory accesses around the atomic variable", so independent of what actual hardware will or will not do, does imply that bits_.load(std::memory_order_relaxed) could technically never read an updated value after bits_ is updated on another thread in a conforming implementation?

strong>我在标准(29.4 p13)中发现了这一点:

I found this in the standard (29.4 p13):


实现应该使原子库在合理的时间内

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

所以显然等待无限长的更新值是(大多数?)的问题,但没有硬保证除此之外的任何特定时间间隔的新鲜度应为合理;

So apparently waiting "infinitely long" for an updated value is (mostly?) out of the question, but there's no hard guarantee of any specific time interval of freshness other than that is should be "reasonable"; still, the question about actual hardware behavior stands.

推荐答案

C ++ 11 atomics处理三个问题:

C++11 atomics deal with three issues:


  1. 确保在不使用线程切换的情况下读取或写入完整值;

  1. ensuring that a complete value is read or written without a thread switch; this prevents tearing.

确保编译器不会在原子读取或写入操作中重新排序线程中的指令;

ensuring that the compiler does not re-order instructions within a thread across an atomic read or write; this ensures ordering within the thread.

确保在原子写入之前在线程中写入的数据将被查看(对于内存顺序参数的适当选择)一个读取原子变量并看到写入的值的线程。

ensuring (for appropriate choices of memory order parameters) that data written within a thread prior to an atomic write will be seen by a thread that reads the atomic variable and sees the value that was written. This is visibility.

当您使用 memory_order_relaxed 从轻松的商店或负载获得可见性的保证。你会得到前两个保证。

When you use memory_order_relaxed you don't get a guarantee of visibility from the relaxed store or load. You do get the first two guarantees.

实施应该(即鼓励)使内存写在合理的时间内可见,即使有轻松的排序。这是关于可以说的最好的;

Implementations "should" (i.e. are encouraged to) make memory writes visible within a reasonable amount of time, even with relaxed ordering. That's about the best that can be said; sooner or later these things should show up.

因此,从正式来看,一个从未使轻松写入对轻松读取可见的实现符合语言定义。在实践中,这不会发生。

So, yes, formally, an implementation that never made relaxed writes visible to relaxed reads conforms to the language definition. In practice, this won't happen.

对于 volatile ,请问编译器供应商。这取决于实施。

As to what volatile does, ask your compiler vendor. It's up to the implementation.

这篇关于可以挥发但不受限制的读取产生无限期的值? (在真实硬件上)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆