C11原子获取/发布和x86_64缺乏加载/存储一致性? [英] C11 Atomic Acquire/Release and x86_64 lack of load/store coherence?

查看:82
本文介绍了C11原子获取/发布和x86_64缺乏加载/存储一致性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力解决C11标准的5.1.2.4节,尤其是Release/Acquire的语义.我注意到 https://preshing.com/20120913/acquire-and-release-semantics/(以及其他)指出:

I am struggling with Section 5.1.2.4 of the C11 Standard, in particular the semantics of Release/Acquire. I note that https://preshing.com/20120913/acquire-and-release-semantics/ (amongst others) states that:

...释放语义可防止以程序顺序在写释放之前进行任何读或写操作,从而对写释放进行内存重新排序.

... Release semantics prevent memory reordering of the write-release with any read or write operation that precedes it in program order.

因此,对于以下情况:

typedef struct test_struct
{
  _Atomic(bool) ready ;
  int  v1 ;
  int  v2 ;
} test_struct_t ;

extern void
test_init(test_struct_t* ts, int v1, int v2)
{
  ts->v1 = v1 ;
  ts->v2 = v2 ;
  atomic_store_explicit(&ts->ready, false, memory_order_release) ;
}

extern int
test_thread_1(test_struct_t* ts, int v2)
{
  int v1 ;
  while (atomic_load_explicit(&ts->ready, memory_order_acquire)) ;
  ts->v2 = v2 ;       // expect read to happen before store/release 
  v1     = ts->v1 ;   // expect write to happen before store/release 
  atomic_store_explicit(&ts->ready, true, memory_order_release) ;
  return v1 ;
}

extern int
test_thread_2(test_struct_t* ts, int v1)
{
  int v2 ;
  while (!atomic_load_explicit(&ts->ready, memory_order_acquire)) ;
  ts->v1 = v1 ;
  v2     = ts->v2 ;   // expect write to happen after store/release in thread "1"
  atomic_store_explicit(&ts->ready, false, memory_order_release) ;
  return v2 ;
}

执行它们的地方:

>   in the "main" thread:  test_struct_t ts ;
>                          test_init(&ts, 1, 2) ;
>                          start thread "2" which does: r2 = test_thread_2(&ts, 3) ;
>                          start thread "1" which does: r1 = test_thread_1(&ts, 4) ;

因此,我希望线程"1"具有r1 == 1,线程"2"具有r2 = 4.

I would, therefore, expect thread "1" to have r1 == 1 and thread "2" to have r2 = 4.

我希望是因为(按照第5.1.2.4节的第16和18段):

I would expect that because (following paras 16 and 18 of sect 5.1.2.4):

  • 所有(非原子的)读取和写入都是先于顺序"的,因此是在线程"1"中的原子写入/释放的发生之前",
  • 在线程"2"(当它读为"true"时)自动在线程之间发生线程间插入",
  • 它依次(在原子"2"之前)(而不是原子)在之前"(因此在之前"发生,因此在之前"发生)

但是,我完全有可能不理解该标准.

However, it is entirely possible that I have failed to understand the standard.

我观察到为x86_64生成的代码包括:

I observe that the code generated for x86_64 includes:

test_thread_1:
  movzbl (%rdi),%eax      -- atomic_load_explicit(&ts->ready, memory_order_acquire)
  test   $0x1,%al
  jne    <test_thread_1>  -- while is true
  mov    %esi,0x8(%rdi)   -- (W1) ts->v2 = v2
  mov    0x4(%rdi),%eax   -- (R1) v1     = ts->v1
  movb   $0x1,(%rdi)      -- (X1) atomic_store_explicit(&ts->ready, true, memory_order_release)
  retq   

test_thread_2:
  movzbl (%rdi),%eax      -- atomic_load_explicit(&ts->ready, memory_order_acquire)
  test   $0x1,%al
  je     <test_thread_2>  -- while is false
  mov    %esi,0x4(%rdi)   -- (W2) ts->v1 = v1
  mov    0x8(%rdi),%eax   -- (R2) v2     = ts->v2   
  movb   $0x0,(%rdi)      -- (X2) atomic_store_explicit(&ts->ready, false, memory_order_release)
  retq   

提供 R1和X1以此顺序发生,这给出了我期望的结果.

And provided that R1 and X1 happen in that order, this gives the result I expect.

但是我对x86_64的理解是,读取与其他读取按顺序发生,而写入与其他写入按顺序发生,但是读取和写入可能不会彼此按顺序发生.这意味着X1有可能在R1之前发生,甚至X1,X2,W2,R1也可能以该顺序发生-我相信. [这似乎极不可能,但是如果R1被某些缓存问题阻止了?]

But my understanding of x86_64 is that reads happen in order with other reads and writes happen in order with other writes, but reads and writes may not happen in order with each other. Which implies it is possible for X1 to happen before R1, and even for X1, X2, W2, R1 to happen in that order -- I believe. [This seems desperately unlikely, but if R1 were held up by some cache issues ?]

请:我不明白什么?

我注意到,如果将ts->ready的加载/存储更改为memory_order_seq_cst,则为存储生成的代码为:

I note that if I change the loads/stores of ts->ready to memory_order_seq_cst, the code generated for the stores is:

  xchg   %cl,(%rdi)

这与我对x86_64的理解一致,并且会给出我期望的结果.

which is consistent with my understanding of x86_64 and will give the result I expect.

推荐答案

x86的内存模型基本上是顺序一致性加存储缓冲区(带有存储转发)的.因此,每个商店都是一个发布商店 1 .这就是为什么只有seq-cst存储需要任何特殊说明的原因. ( C/C ++ 11原子映射到asm ).另外, https://stackoverflow.com/tags/x86/info 包含一些指向x86文档的链接,包括 x86-TSO内存模型的正式描述(基本上对于大多数人来说是不可读的;需要仔细研究很多定义).

x86's memory model is basically sequential-consistency plus a store buffer (with store forwarding). So every store is a release-store1. This is why only seq-cst stores need any special instructions. (C/C++11 atomics mappings to asm). Also, https://stackoverflow.com/tags/x86/info has some links to x86 docs, including a formal description of the x86-TSO memory model (basically unreadable for most humans; requires wading through a lot of definitions).

由于您已经阅读了Jeff Preshing的精彩文章系列,因此,我将为您指出另一篇更详细的文章: https://preshing.com/20120930/weak-vs-strong-memory -models/

Since you're already reading Jeff Preshing's excellent series of articles, I'll point you at another one that goes into more detail: https://preshing.com/20120930/weak-vs-strong-memory-models/

在x86上唯一允许的重新排序是StoreLoad,而不是LoadStore ,如果我们用这些术语来说的话. (如果负载仅与商店部分重叠,则商店转发可以做些额外的有趣的事情; 全局不可见的负载说明,尽管您永远不会在stdatomic的编译器生成的代码中得到它.)

The only reordering that's allowed on x86 is StoreLoad, not LoadStore, if we're talking in those terms. (Store forwarding can do extra fun stuff if a load only partially overlaps a store; Globally Invisible load instructions, although you'll never get that in compiler-generated code for stdatomic.)

@EOF引用了英特尔手册中的正确报价:

@EOF commented with the right quote from Intel's manual:

英特尔®64和IA-32体系结构软件开发人员手册第3卷(3A,3B,3C和3D):《系统编程指南》, 8.2.3.3存储库未按较早的加载顺序重新排序.


脚注1:忽略顺序较弱的NT商店;这就是为什么您在进行NT存储后通常sfence的原因. C11/C ++ 11实现假定您不使用NT存储.如果是这样,请在发布操作之前使用_mm_sfence,以确保它尊重您的NT存储. (通常不要在其他情况下,请使用_mm_mfence/_mm_sfence ;通常,您只需要阻止编译时重新排序即可.或者当然,只需使用stdatomic.)


Footnote 1: ignoring weakly-ordered NT stores; this is why you normally sfence after doing NT stores. C11 / C++11 implementations assume you aren't using NT stores. If you are, use _mm_sfence before a release operation to make sure it respects your NT stores. (In general don't use _mm_mfence / _mm_sfence in other cases; usually you only need to block compile-time reordering. Or of course just use stdatomic.)

这篇关于C11原子获取/发布和x86_64缺乏加载/存储一致性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆