StoreStore重新排序发生在为x86编译C ++时 [英] StoreStore reordering happens when compiling C++ for x86

查看:206
本文介绍了StoreStore重新排序发生在为x86编译C ++时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

while(true) {
    int x(0), y(0);

    std::thread t0([&x, &y]() {
        x=1;
        y=3;
    });
    std::thread t1([&x, &y]() {
        std::cout << "(" << y << ", " <<x <<")" << std::endl;

    });


    t0.join();
    t1.join();
}



首先,我知道它是UB因为数据竞争。
但是,我只期望只有以下输出:

Firstly, I know that it is UB because of the data race. But, I expected only the following outputs:

(3,1), (0,1), (0,0)



我相信这是不可能得到 3,0),但我没有。所以我很困惑 - 毕竟 x86不允许StoreStore重新排序

I was convinced that it was not possible to get (3,0), but I did. So I am confused- after all x86 doesn't allow StoreStore reordering.

y = 3之前, x = 1 c $ c>

我想从理论上看,输出(3,0)是不可能的的x86内存模型。我想这是因为UB。但我不确定。请解释。

I suppose that from theoretical point of view the output (3,0) is impossible because of the x86 memory model. I suppose that it appeared because of the UB. But I am not sure. Please explain.

除了StoreStore重新排序之外,还有什么可以解释(3,0) / p>

What else besides StoreStore reordering could explain getting (3,0)?

推荐答案

你是用C ++编写的,它有一个弱的内存模型。您没有对阻止在编译时重新排序采取任何措施。

You're writing in C++, which has a weak memory model. You didn't do anything to prevent reordering at compile-time.

如果你看asm,你可能会发现存储发生在与源相反的顺序,和/或负载发生在相反的顺序

If you look at the asm, you'll probably find that the stores happen in the opposite order from the source, and/or that the loads happen in the opposite order from what you expect.

加载在源代码中没有任何顺序:如果希望,编译器可以在y之前加载x,即使它们 std :: atomic types:

The loads don't have any ordering in the source: the compiler can load x before y if it wants to, even if they were std::atomic types:

            t2 <- x(0)
t1 -> x(1)
t1 -> y(3)
            t2 <- y(3)





b $ b

这不是re排序,因为首先没有定义顺序:


This isn't even "re"ordering, since there was no defined order in the first place:

std: :cout< (<< y<<,<< x<<)< std :: endl; 不一定在 x 之前评估 y < 运算符具有从左到右的关联性,运算符重载只是

std::cout << "(" << y << ", " <<x <<")" << std::endl; doesn't necessarily evaluate y before x. The << operator has left-to-right associativity, and operator overloading is just syntactic sugar for

op<<( op<<(op<<(y),x), endl);  // omitting the string constants.

由于函数参数的求值顺序是未定义的(即使我们在讨论嵌套函数调用),编译器可以在评估<$ c $之前自由评估 x c> op <<(y)。 IIRC,gcc通常只是从右到左计算,如果必要,匹配推送args到堆栈的顺序。关于问题的答案表明,情况常常如此。

Since the order of evaluation of function arguments is undefined (even if we're talking about nested function calls), the compiler is free to evaluate x before evaluating op<<(y). IIRC, gcc often just evaluates right to left, matching the order of pushing args onto the stack if necessary. The answers on the linked question indicate that that's often the case. But of course that behaviour is in no way guaranteed by anything.

它们加载的顺序是未定义的,即使他们 std :: atomic 。我不确定在 x的评估之间是否有序列点 y 。如果不是,那么就和你评估 x + y 一样:编译器可以任意顺序评估操作数,因为它们是无序的。如果有一个顺序点,那么有一个顺序,但它是未定义的顺序(即它们不确定顺序)。

The order they're loaded is undefined even if they were std::atomic. I'm not sure if there's a sequence point between the evaluation of x and y. If not, then it would be the same as if you evaluated x+y: The compiler is free to evaluate the operands in any order because they're unsequenced. If there is a sequence point, then there is an order but it's undefined which order (i.e. they're indeterminately sequenced).

略相关: gcc不会重新排序表达式中的非内联函数调用评价,利用C的离开顺序的事实未指定的事实。我假设内联后它的优化更好,但在这种情况下,你没有任何理由支持 y 之前加载 x

Slightly related: gcc doesn't reorder non-inline function calls in expression evaluation, to take advantage of the fact that C leaves the order of evaluation unspecified. I assume after inlining it does optimize better, but in this case you haven't given it any reason to favour loading y before x.

关键点是这并不重要,为什么编译器决定重新排序,只是它允许。如果你不强加所有必要的排序要求,你的代码是buggy,全停。没关系,如果它碰巧使用一些编译器与一些特定的周围的代码;

The key point is that it doesn't matter exactly why the compiler decided to reorder, just that it's allowed to. If you don't impose all the necessary ordering requirements, your code is buggy, full-stop. It doesn't matter if it happens to work with some compilers with some specific surrounding code; that just means it's a latent bug.

请参阅 http://en.cppreference.com/w/cpp/atomic/atomic 了解如何/为什么这样工作:

See http://en.cppreference.com/w/cpp/atomic/atomic for docs on how/why this works:

// Safe version, which should compile to the asm you expected.

while(true) {
    int x(0);                  // should be atomic, too, because it can be read+written at the same time.  You can use memory_order_relaxed, though.
    std::atomic<int> y(0);

    std::thread t0([&x, &y]() {
        x=1;
        // std::atomic_thread_fence(std::memory_order_release);  // A StoreStore fence is an alternative to using a release-store
        y.store(3, std::memory_order_release);
    });
    std::thread t1([&x, &y]() {
        int tx, ty;
        ty = y.load(std::memory_order_acquire);
        // std::atomic_thread_fence(std::memory_order_acquire);  // A LoadLoad fence is an alternative to using an acquire-load
        tx = x;
        std::cout << ty + tx << "\n";   // Don't use endl, we don't need to force a buffer flush here.
    });

    t0.join();
    t1.join();
}

对于获取/释放语义以给你所需的顺序,最后一个存储必须是发布存储,并且获取负载必须是第一负载。这就是为什么我使 y a std :: atomic,即使你将x设置为0或1更像一个标志。

For Acquire/Release semantics to give you the ordering you want, the last store has to be the release-store, and the acquire-load has to be the first load. That's why I made y a std::atomic, even though you're setting x to 0 or 1 more like a flag.

如果你不想使用release / acquire,你可以在商店和LoadLoad之间加载一个StoreStore栅栏。在x86上,这只是防止编译时重新排序,但在ARM上,你会得到一个内存屏障指令。 (注意, y 仍然在技术上需要是原子以服从C的数据竞赛规则,但是你可以使用 std :: memory_order_relaxed

If you don't want to use release/acquire, you could put a StoreStore fence between the stores and a LoadLoad fence between the loads. On x86, this would just prevent compile-time reordering, but on ARM you'd get a memory-barrier instruction. (Note that y still technically needs to be atomic to obey C's data-race rules, but you can use std::memory_order_relaxed on it.)

实际上,即使发布/获得订单 y x 也应该是原子。即使我们看到 y == 0 ,x的负载仍然会发生。因此,在线程2中读取x与在线程1中写入y不同步,因此它是UB。实际上, int code>加载/存储在x86(和大多数其他架构)是原子的。但是请记住, std :: atomic 意味着其他语义,比如该值可以被其他线程异步更改。

Actually, even with Release/Acquire ordering for y, x should be atomic as well. The load of x still happens even if we see y==0. So reading x in thread 2 is not synchronized with writing y in thread 1, so it's UB. In practice, int loads/stores on x86 (and most other architectures) are atomic. But remember that std::atomic implies other semantics, like the fact that the value can be changed asynchronously by other threads.

如果你在一个存储 i 和<$ c $的线程中循环,硬件重新排序测试可以运行得更快c> -i 或某事,并且在其他线程内循环检查abs(y)总是> = abs(x)。每次测试创建和销毁两个线程是很大的开销。

The hardware-reordering test could run a lot faster if you looped inside one thread storing i and -i or something, and looped inside the other thread checking that abs(y) is always >= abs(x). Creating and destroying two threads per test is a lot of overhead.

当然,为了得到这个权利,你必须知道如何使用C来生成你想要的asm (或直接写入asm)。

Of course, to get this right, you have to know how to use C to generate the asm you want (or write in asm directly).

这篇关于StoreStore重新排序发生在为x86编译C ++时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆