使用内存屏障强制按顺序执行 [英] Using memory barriers to force in-order execution

查看：536 发布时间：2016/12/22 13:53:17 c gcc assembly compilation memory-barriers

本文介绍了使用内存屏障强制按顺序执行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

尝试继续我的想法，使用软件和硬件内存屏障我可以禁用在编译器优化编译的代码中的特定函数的乱序优化，因此我可以实现软件信号量使用像 Peterson 或 Deker 的算法，不需要无序执行，我测试了下面的代码， SW障碍 asm volatile（：：：memory）和gcc内置HW障碍 __ sync_synchronize ：

  #include< stdio.h> 
 int main（int argc，char ** argv）
 {
 int x = 0; 
 asm volatile（：：：memory）; 
 __sync_synchronize（）; 
 x = 1; 
 asm volatile（：：：memory）; 
 __sync_synchronize（）; 
 x = 2; 
 asm volatile（：：：memory）; 
 __sync_synchronize（）; 
 x = 3; 
 printf（％d，x）; 
 return 0; 
}

但是编译输出文件是：

  main：
 .LFB24：
 .cfi_startproc 
 subq $ 8，％rsp 
 .cfi_def_cfa_offset 16 
 mfence 
 mfence 
 movl $ 3，％edx 
 movl $ .LC0，％esi 
 movl $ 1，％edi 
 xorl％eax，％eax 
 mfence 
 call __printf_chk 
 xorl％eax，％eax 
 addq $ 8，％rsp

$ b b

如果我删除障碍并重新编译，我会得到：

main .LFB24： .cfi_startproc subq $ 8，％rsp .cfi_def_cfa_offset 16 movl $ 3，％edx movl $ .LC0，％esi movl $ 1，％edi xorl％eax，％eax call __printf_chk xorl％eax，％eax addq $ 8，％rsp gcc -Wall -O2 编译的$ p>

<
$ b

预期的结果是，包含内存屏障的代码的输出文件将包含我在源代码中的值的所有分配， mfence 之间。

根据相关的StackOverflow帖子 - gcc内存屏障__sync_synchronize vs asm volatile（：：：memory ）在每次迭代中添加内联装配时，不允许gcc更改操作的顺序以后：，当CPU执行此代码时，允许重新排序 under the hood下的操作，只要它不破坏内存排序模型。这意味着执行操作可以执行次序（如果CPU支持，这些天大多数）。一个HW 围栏会阻止这种情况。但是正如你所看到的，代码和内存的唯一区别障碍，没有它们的代码是前者包含 mfence ，我不希望看到它，并且不包括所有的分配。为什么具有内存屏障的文件的输出文件不是我预期的 - 为什么 mfence 订单已更改？为什么编译器删除了一些分配？是否编译器允许进行这样的优化，即使内存屏障被应用和分离每一行代码？引用内存屏障类型和用法：内存障碍 - http：// bruceblinn .com / linuxinfo / MemoryBarriers.html GCC Builtins - https://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc/Atomic-Builtins.html 解决方案内存屏障告诉编译器/ CPU指令不应跨越障碍重新排序，如果你定义你的 x as volatile ，编译器不能做出这样的假设，即它是关心 x s值的唯一实体并且必须遵循C抽象机的规则，这是为了实际发生内存写入。在你的特定情况下，你可以跳过障碍，因为它已经保证易失性访问不会相互重新排序。如果您有C11支持，最好使用 _Atomic s，它还可以保证正常赋值不会根据你的 x 重新排序，并且访问是原子的。编辑：GCC（以及clang）似乎在这方面不一致，并不总是做这个优化。我已打开有关此问题的GCC错误报告。 Trying to go on with my idea that using both software and hardware memory barriers I could disable the out-of-order optimization for a specific function inside a code that is compiled with compiler optimization, and therefore I could implement software semaphore using algorithms like Peterson or Deker that requires no out-of-order execution, I have tested the following code that contains both SW barrier asm volatile("": : :"memory") and gcc builtin HW barrier __sync_synchronize: #include <stdio.h> int main(int argc, char ** argv) { int x=0; asm volatile("": : :"memory"); __sync_synchronize(); x=1; asm volatile("": : :"memory"); __sync_synchronize(); x=2; asm volatile("": : :"memory"); __sync_synchronize(); x=3; printf("%d",x); return 0; } But the compilation output file is: main: .LFB24: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 mfence mfence movl $3, %edx movl $.LC0, %esi movl $1, %edi xorl %eax, %eax mfence call __printf_chk xorl %eax, %eax addq $8, %rsp And if I remove the barriers and compile again, I get: main .LFB24: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 movl $3, %edx movl $.LC0, %esi movl $1, %edi xorl %eax, %eax call __printf_chk xorl %eax, %eax addq $8, %rsp both compiled with gcc -Wall -O2 in Ubuntu 14.04.1 LTS, x86. The expected result was that the output file of the code that contains the memory barriers will contain all the assignments of the values I have in my source code, with mfence between them. According to a related StackOverflow post - gcc memory barrier __sync_synchronize vs asm volatile("": : :"memory") When adding your inline assembly on each iteration, gcc is not permitted to change the order of the operations past the barrier And later on: However, when the CPU performes this code, it's permitted to reorder the operations "under the hood", as long as it does not break memory ordering model. This means that performing the operations can be done out of order (if the CPU supports that, as most do these days). A HW fence would have prevented that. But as you can see, the only difference between the code with the memory barriers and the code without them is that the former one contains mfence in a way I was not expected to see it, and not all the assignments are included. Why is the output file of the file with the memory barriers was not as I expected- Why does the mfence order has been altered? Why did the compiler remove some of the assignments? Is the compiler allowed to make such optimizations even if the memory barrier is applied and separates every single line of code? References to the memory barrier types and usage: Memory Barriers - http://bruceblinn.com/linuxinfo/MemoryBarriers.html GCC Builtins - https://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc/Atomic-Builtins.html 解决方案 The memory barriers tell the compiler/CPU that instruction shouldn't be reordered across the barrier, they don't mean that writes that can be proven pointless have to be done anyway. If you define your x as volatile, the compiler can't make the assumption, that it's the only entity that cares about xs value and has to follow the rules of the C abstract machine, which is for the memory write to actually happen. In your specific case you could then skip the barriers, because it's already guaranteed that volatile accesses aren't reordered against each other. If you have C11 support, you are better off using _Atomics, which additionally can guarantee that normal assignments won't be reordered against your x and that the accesses are atomic. EDIT: GCC (as well as clang) seem to be inconsistent in this regard and won't always do this optimizaton. I opened a GCC bug report regarding this. 这篇关于使用内存屏障强制按顺序执行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

使用内存屏障强制按顺序执行 [英] Using memory barriers to force in-order execution

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用内存屏障强制按顺序执行 [英] Using memory barriers to force in-order execution

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭