使用 memory_order_seq_cst 输出 10 [英] output 10 with memory_order_seq_cst

查看:129
本文介绍了使用 memory_order_seq_cst 输出 10的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我运行这个程序时,我得到的输出为 10,这对我来说似乎是不可能的.我在 x86_64 核心 i3 ubuntu 上运行它.

When i run this program i get output as 10 which seems to be impossible for me. I'm running this on x86_64 core i3 ubuntu.

如果输出是 10,那么 1 一定来自 c 或 d.

If the output is 10, then 1 must have come from either c or d.

同样在线程 t[0] 中,我们将 c 赋值为 1.现在 a 是 1,因为它发生在 c=1 之前.c 等于被线程 1 设置为 1 的 b.所以当我们存储 d 时,它应该是 1,因为 a=1.

Also in thread t[0], we assign c as 1. Now a is 1 since it occurs before c=1. c is equal to b which was set to 1 by thread 1. So when we store d it should be 1 as a=1.

  • 可以使用 memory_order_seq_cst 输出 10 吗?我尝试在第一行(变量 =1 )和第二行(printf)之间的两个线程上插入 atomic_thread_fence(seq_cst) 但它仍然没有工作.

取消注释 围栏 不起作用.尝试使用 g++clang++ 运行.两者都给出相同的结果.

Uncommenting both the fence doesn't work. Tried running with g++ and clang++. Both give the same result.

#include<thread>
#include<unistd.h>
#include<cstdio>
#include<atomic>
using namespace std;

atomic<int> a,b,c,d;

void foo(){
        a.store(1,memory_order_seq_cst);
//        atomic_thread_fence(memory_order_seq_cst);
        c.store(b,memory_order_seq_cst);
}

void bar(){
        b.store(1,memory_order_seq_cst);
  //      atomic_thread_fence(memory_order_seq_cst);
        d.store(a,memory_order_seq_cst);
}

int main(){
        thread t[2];
        t[0]=thread(foo); t[1]=thread(bar);
        t[0].join();t[1].join();
        printf("%d%d\n",c.load(memory_order_seq_cst),d.load(memory_order_seq_cst));
}

bash$ while [ true ]; do ./a.out | grep "10" ; done 
10
10
10
10

推荐答案

10 (c=1, d=0) 很容易解释:bar 恰好先运行, 之前完成foo 读取 b.

10 (c=1, d=0) is easily explained: bar happened to run first, and finished before foo read b.

在不同内核上启动线程的内核间通信的怪癖意味着即使 thread(foo) 首先在主线程中运行,也很容易发生这种情况.例如也许一个中断到达了操作系统为 foo 选择的核心,延迟了它实际进入该代码1.

Quirks of inter-core communication to get threads started on different cores means it's easily possible for this to happen even though thread(foo) ran first in the main thread. e.g. maybe an interrupt arrived at the core the OS chose for foo, delaying it from actually getting into that code1.

请记住,seq_cst 仅保证所有 seq_cst 操作存在某个总顺序,该顺序与每个线程内的先序顺序兼容.(以及由其他因素建立的任何其他发生在之前的关系).因此,以下原子操作顺序是可能的,甚至无需将 a.load2 与结果的 d.store 分开int 临时.

Remember that seq_cst only guarantees that some total order exists for all seq_cst operations which is compatible with the sequenced-before order within each thread. (And any other happens-before relationship established by other factors). So the following order of atomic operations is possible without even breaking out the a.load2 in bar separately from the d.store of the resulting int temporary.

        b.store(1,memory_order_seq_cst);   // bar1.  b=1
        d.store(a,memory_order_seq_cst);   // bar2.  a.load reads 0, d=0

        a.store(1,memory_order_seq_cst);   // foo1
        c.store(b,memory_order_seq_cst);   // foo2.  b.load reads 1, c=1
// final: c=1, d=0

atomic_thread_fence(seq_cst) 在任何地方都没有影响,因为您的所有操作都已经是 seq_cst. 栅栏基本上只是停止对该线程操作的重新排序;它不会等待或与其他线程中的栅栏同步.

atomic_thread_fence(seq_cst) has no impact anywhere because all your operations are already seq_cst. A fence basically just stops reordering of this thread's operations; it doesn't wait for or sync with fences in other threads.

(只有看到另一个线程存储的值的加载才能创建同步.但是这样的加载不会等待另一个存储;它无法知道还有另一个存储.如果你想继续加载直到你看到你期望的值,你必须写一个自旋等待循环.)

(Only a load that sees a value stored by another thread can create synchronization. But such a load doesn't wait for the other store; it has no way of knowing there is another store. If you want to keep loading until you see the value you expect, you have to write a spin-wait loop.)

脚注 1:由于您所有的原子变量可能都在同一个缓存行中,即使执行确实在两个不同的内核上同时到达了 foobar 的顶部,错误共享很可能会让一个线程的两个操作都发生,而另一个核心仍在等待获得独占所有权.尽管 seq_cst 存储足够慢(至少在 x86 上),硬件公平性的东西可能会在提交 1 的第一个存储后放弃独占所有权.无论如何,一个线程中的两个操作在另一个线程之前发生并获得 10 或 01 的很多方法.如果我们获得 b=,甚至可能获得 111 然后 a=1 在任一加载之前.使用 seq_cst 确实会阻止硬件提前加载(在 store 全局可见之前),因此很有可能.

Footnote 1: Since all your atomic vars are probably in the same cache line, even if execution did reach the top of foo and bar at the same time on two different cores, false-sharing is likely going to let both operations from one thread happen while the other core is still waiting to get exclusive ownership. Although seq_cst stores are slow enough (on x86 at least) that hardware fairness stuff might relinquish exclusive ownership after committing the first store of 1. Anyway, lots of ways for both operations in one thread to happen before the other thread and get 10 or 01. Even possible to get 11 if we get b=1 then a=1 before either load. Using seq_cst does stop the hardware from doing the load early (before the store is globally visible), so it's very possible.

脚注 2:裸 a 的左值到右值评估使用重载的 (int) 转换,它等效于 a.load(seq_cst).来自 foo 的操作可能发生在加载和从中获取临时值的 d.store 之间.d.store(a) 不是原子副本;它相当于 int tmp = a; d.store(tmp);.这对于解释您的观察没有必要.

Footnote 2: The lvalue-to-rvalue evaluation of bare a uses the overloaded (int) conversion which is equivalent to a.load(seq_cst). The operations from foo could happen between that load and the d.store that gets a temporary value from it. d.store(a) is not an atomic copy; it's equivalent to int tmp = a; d.store(tmp);. That isn't necessary to explain your observations.

这篇关于使用 memory_order_seq_cst 输出 10的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆