GCC对读/写指令进行重新排序 [英] GCC's reordering of read/write instructions

查看:133
本文介绍了GCC对读/写指令进行重新排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Linux的同步原语(自旋锁,互斥锁,RCU)使用内存屏障指令来强制内存访问指令得到重新排序。这个重新排序可以由CPU自己完成,也可以由编译器完成。



有人可以展示GCC生成代码的例子吗?我主要对x86感兴趣。我问这个问题的原因是要了解GCC如何决定什么指令可以重新排序。不同的x86 mirco架构(例如:sandy bridge vs ivy bridge)使用不同的缓存架构。因此,我想知道GCC如何进行有效的重新排序,这有助于执行性能,而不考虑缓存体系结构。一些示例C代码和重新排序的GCC生成代码将非常有用。谢谢!

解决方案

GCC可能做的重新排序与(x86)CPU可能做的重新排序无关。



让我们从编译器重新排序开始。 C语言规则是这样的:GCC被禁止重新排序 volatile 加载和存储相对于彼此的内存访问,或者删除它们,当序列点出现在他们(感谢 bobc 的澄清)。也就是说,在汇编输出中,这些内存访问将会出现,并按照您指定的顺序精确排序。另一方面,非 volatile 访问可以针对所有其他访问进行重新排序, volatile 或不是,例如,一个非 - 易变的

C代码中的code> load可以按照代码所说的次数完成,但是顺序不同(例如,如果编译器觉得在更早或更晚的时候更多的寄存器可用时更方便)。它可以被执行的次数少于代码所说的时间(例如,如果在大型表达式中,值的副本仍然在寄存器中可用)。或者它甚至可以被删除(例如,如果编译器可以证明加载的无用性,或者它将一个变量完全移入寄存器中)。

为了防止编译器重新排序在其他时候,您必须使用编译器特定的屏障。为此,GCC使用 __ asm__ __volatile __(:::memory); >。

这与 CPU重新排序,也就是内存排序模型不同。古代CPU按照它们在程序中出现的顺序精确执行指令;这称为程序顺序强记忆顺序模型。然而,现代CPU有时会通过弱化一些内存模式来求助于秘籍,以加快运行速度。



x86 CPUs削弱的方式内存模型记录在英特尔软件开发人员手册第3卷第8章第8.2.2节P6和更多最新处理器系列中的内存排序中。这部分是它的内容:




  • 读取不会与其他读取重新排序。
  • 写入内容不会与其他写入重新排序,并且有[例外]异常。

  • 读取可能是将旧的写入重新排序到不同的位置,但不会与旧的写入到同一位置。

  • 读取或写入操作不能通过I / O指令,锁定指令或序列化指令进行重新排序。 >
  • 读取无法通过早期的LFENCE和MFENCE指令。

  • 写入不能通过早期的LFENCE,SFENCE和MFENCE指令。 > LFENCE指令无法通过先前的读取。

  • SFENCE指令无法通过先前的写入。

  • MFENCE指令无法通过先前的读取或写入。



它也给出了很好的例子,说明什么可以和不可以重新排序,在第8.2.3节例子说明我mory-Ordering Principles



您可以看到,使用FENCE指令可以防止x86 CPU错误地重新排序内存访问。 b
$ b

最后,您可能对感兴趣这个链接进一步细化,并附带您想要的汇编示例。

Linux's synchronization primitives (spinlock, mutex, RCUs) use memory barrier instructions to force the memory access instructions from getting re-ordered. And this reordering can be done either by the CPU itself or by the compiler.

Can someone show some examples of GCC produced code where such reordering is done ? I am interested mainly in x86. The reason I am asking this is to understand how GCC decides what instructions can be reordered. Different x86 mirco architectures (for ex: sandy bridge vs ivy bridge) use different cache architecture. Hence I am wondering how GCC does effective reordering that helps in the execution performance irrespective of the cache architecture. Some example C code and reordered GCC generated code would be very useful. Thanks!

解决方案

The reordering that GCC may do is unrelated to the reordering an (x86) CPU may do.

Let's start off with compiler reordering. The C language rules are such that GCC is forbidden from reordering volatile loads and store memory accesses with respect to each other, or deleting them, when a sequence point occurs between them (Thanks to bobc for this clarification). That is to say, in the assembly output, those memory accesses will appear, and will be sequenced precisely in the order you specified. Non-volatile accesses, on the other hand, can be reordered with respect to all other accesses, volatile or not, provided that (by the as-if rule) the end result of the calculation is the same.

For instance, a non-volatile load in the C code could be done as many times as the code says, but in a different order (e.g. If the compiler feels it's more convenient to do it earlier or later when more registers are available). It could be done fewer times than the code says (e.g. If a copy of the value happened to still be available in a register in the middle of a large expression). Or it could even be deleted (e.g. if the compiler can prove the uselessness of the load, or if it moved a variable entirely into a register).

To prevent compiler reorderings at other times, you must use a compiler-specific barrier. GCC uses __asm__ __volatile__("":::"memory"); for this purpose.

This is different from CPU reordering, a.k.a. the memory-ordering model. Ancient CPUs executed instructions precisely in the order they appeared in the program; This is called program ordering, or the strong memory-ordering model. Modern CPUs, however, sometimes resort to "cheats" to run faster, by weakening a little the memory model.

The way x86 CPUs weaken the memory model is documented in Intel's Software Developer Manuals, Volume 3, Chapter 8, Section 8.2.2 "Memory Ordering in P6 and More Recent Processor Families". This is, in part, what it reads:

  • Reads are not reordered with other reads.
  • Writes are not reordered with older reads.
  • Writes to memory are not reordered with other writes, with [some] exceptions.
  • Reads may be reordered with older writes to different locations but not with older writes to the same location.
  • Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions.
  • Reads cannot pass earlier LFENCE and MFENCE instructions.
  • Writes cannot pass earlier LFENCE, SFENCE, and MFENCE instructions.
  • LFENCE instructions cannot pass earlier reads.
  • SFENCE instructions cannot pass earlier writes.
  • MFENCE instructions cannot pass earlier reads or writes.

It also gives very good examples of what can and cannot be reordered, in Section 8.2.3 "Examples Illustrating the Memory-Ordering Principles".

As you can see, one uses FENCE instructions to prevent an x86 CPU from reordering memory accesses inappropriately.

Lastly, you may be interested in this link, which goes into further detail and comes with the assembly examples you crave.

这篇关于GCC对读/写指令进行重新排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆