使用内存屏障强制按顺序执行 [英] Using memory barriers to force in-order execution

查看:536
本文介绍了使用内存屏障强制按顺序执行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试继续我的想法,使用软件和硬件内存屏障我可以禁用在编译器优化编译的代码中的特定函数的乱序优化,因此我可以实现软件信号量使用像 Peterson Deker 的算法,不需要无序执行,我测试了下面的代码, SW障碍 asm volatile(:::memory)和gcc内置HW障碍 __ sync_synchronize

  #include< stdio.h> 
int main(int argc,char ** argv)
{
int x = 0;
asm volatile(:::memory);
__sync_synchronize();
x = 1;
asm volatile(:::memory);
__sync_synchronize();
x = 2;
asm volatile(:::memory);
__sync_synchronize();
x = 3;
printf(%d,x);
return 0;
}

但是编译输出文件是:

  main:
.LFB24:
.cfi_startproc
subq $ 8,%rsp
.cfi_def_cfa_offset 16
mfence
mfence
movl $ 3,%edx
movl $ .LC0,%esi
movl $ 1,%edi
xorl%eax,%eax
mfence
call __printf_chk
xorl%eax,%eax
addq $ 8,%rsp


$ b b

如果我删除障碍并重新编译,我会得到:

  main 
.LFB24:
.cfi_startproc
subq $ 8,%rsp
.cfi_def_cfa_offset 16
movl $ 3,%edx
movl $ .LC0,%esi
movl $ 1,%edi
xorl%eax,%eax
call __printf_chk
xorl%eax,%eax
addq $ 8,%rsp
gcc -Wall -O2 编译的$ p>

<
$ b

预期的结果是,包含内存屏障的代码的输出文件将包含我在源代码中的值的所有分配, mfence 之间。



根据相关的StackOverflow帖子 -



gcc内存屏障__sync_synchronize vs asm volatile(:::memory )


在每次迭代中添加内联装配时,不允许gcc更改操作的顺序


以后:


,当CPU执行此代码时,允许重新排序
under the hood下的操作,只要它不破坏内存
排序模型。这意味着执行操作可以执行
次序(如果CPU支持,这些天大多数)。一个HW
围栏会阻止这种情况。


但是正如你所看到的,代码和内存的唯一区别障碍,没有它们的代码是前者包含 mfence ,我不希望看到它,并且不包括所有的分配。



为什么具有内存屏障的文件的输出文件不是我预期的 - 为什么 mfence 订单已更改?为什么编译器删除了一些分配?是否编译器允许进行这样的优化,即使内存屏障被应用和分离每一行代码?



引用内存屏障类型和用法:




解决方案

内存屏障告诉编译器/ CPU指令不应跨越障碍重新排序,



如果你定义你的 x as volatile ,编译器不能做出这样的假设,即它是关心 x s值的唯一实体并且必须遵循C抽象机的规则,这是为了实际发生内存写入。



在你的特定情况下,你可以跳过障碍,因为它已经保证易失性访问不会相互重新排序。



如果您有C11支持,最好使用 _Atomic s,它还可以保证正常赋值不会根据你的 x 重新排序,并且访问是原子的。






编辑:GCC(以及clang)似乎在这方面不一致,并不总是做这个优化。 我已打开有关此问题的GCC错误报告。


Trying to go on with my idea that using both software and hardware memory barriers I could disable the out-of-order optimization for a specific function inside a code that is compiled with compiler optimization, and therefore I could implement software semaphore using algorithms like Peterson or Deker that requires no out-of-order execution, I have tested the following code that contains both SW barrier asm volatile("": : :"memory") and gcc builtin HW barrier __sync_synchronize:

#include <stdio.h>
int main(int argc, char ** argv)
{
    int x=0;
    asm volatile("": : :"memory");
    __sync_synchronize();
    x=1;
    asm volatile("": : :"memory");
    __sync_synchronize();
    x=2;
    asm volatile("": : :"memory");
    __sync_synchronize();
    x=3;
    printf("%d",x);
    return 0;
}

But the compilation output file is:

main:
.LFB24:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    mfence
    mfence
    movl    $3, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    mfence
    call    __printf_chk
    xorl    %eax, %eax
    addq    $8, %rsp

And if I remove the barriers and compile again, I get:

main
.LFB24:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $3, %edx
    movl    $.LC0, %esi
    movl    $1, %edi
    xorl    %eax, %eax
    call    __printf_chk
    xorl    %eax, %eax
    addq    $8, %rsp

both compiled with gcc -Wall -O2 in Ubuntu 14.04.1 LTS, x86.

The expected result was that the output file of the code that contains the memory barriers will contain all the assignments of the values I have in my source code, with mfence between them.

According to a related StackOverflow post -

gcc memory barrier __sync_synchronize vs asm volatile("": : :"memory")

When adding your inline assembly on each iteration, gcc is not permitted to change the order of the operations past the barrier

And later on:

However, when the CPU performes this code, it's permitted to reorder the operations "under the hood", as long as it does not break memory ordering model. This means that performing the operations can be done out of order (if the CPU supports that, as most do these days). A HW fence would have prevented that.

But as you can see, the only difference between the code with the memory barriers and the code without them is that the former one contains mfence in a way I was not expected to see it, and not all the assignments are included.

Why is the output file of the file with the memory barriers was not as I expected- Why does the mfence order has been altered? Why did the compiler remove some of the assignments? Is the compiler allowed to make such optimizations even if the memory barrier is applied and separates every single line of code?

References to the memory barrier types and usage:

解决方案

The memory barriers tell the compiler/CPU that instruction shouldn't be reordered across the barrier, they don't mean that writes that can be proven pointless have to be done anyway.

If you define your x as volatile, the compiler can't make the assumption, that it's the only entity that cares about xs value and has to follow the rules of the C abstract machine, which is for the memory write to actually happen.

In your specific case you could then skip the barriers, because it's already guaranteed that volatile accesses aren't reordered against each other.

If you have C11 support, you are better off using _Atomics, which additionally can guarantee that normal assignments won't be reordered against your x and that the accesses are atomic.


EDIT: GCC (as well as clang) seem to be inconsistent in this regard and won't always do this optimizaton. I opened a GCC bug report regarding this.

这篇关于使用内存屏障强制按顺序执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆