GCC 对读/写指令的重新排序 [英] GCC's reordering of read/write instructions

查看:16
本文介绍了GCC 对读/写指令的重新排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Linux 的同步原语(自旋锁、互斥锁、RCU)使用内存屏障指令来强制内存访问指令重新排序.这种重新排序可以由 CPU 本身或编译器完成.

Linux's synchronization primitives (spinlock, mutex, RCUs) use memory barrier instructions to force the memory access instructions from getting re-ordered. And this reordering can be done either by the CPU itself or by the compiler.

有人可以展示一些 GCC 生成的代码示例,其中进行了这种重新排序吗?我主要对 x86 感兴趣.我问这个的原因是为了了解 GCC 如何决定可以重新排序哪些指令.不同的 x86 mirco 架构(例如:sandy 桥 vs ivy 桥)使用不同的缓存架构.因此,我想知道 GCC 如何进行有效的重新排序,无论缓存架构如何,都有助于提高执行性能.一些示例 C 代码和重新排序的 GCC 生成的代码将非常有用.谢谢!

Can someone show some examples of GCC produced code where such reordering is done ? I am interested mainly in x86. The reason I am asking this is to understand how GCC decides what instructions can be reordered. Different x86 mirco architectures (for ex: sandy bridge vs ivy bridge) use different cache architecture. Hence I am wondering how GCC does effective reordering that helps in the execution performance irrespective of the cache architecture. Some example C code and reordered GCC generated code would be very useful. Thanks!

推荐答案

GCC 可能进行的重新排序与 (x86) CPU 可能进行的重新排序无关.

The reordering that GCC may do is unrelated to the reordering an (x86) CPU may do.

让我们从编译器重新排序开始.C 语言规则是禁止 GCC 重新排序 volatile 加载和存储内存访问,或者删除它们,当它们之间出现序列点时(感谢 bobc 的澄清).也就是说,在汇编输出中,会出现那些内存访问,并且会按照您指定的顺序精确排序.另一方面,非volatile 访问可以相对于所有其他访问重新排序,volatile 与否,前提是(根据 as-if 规则)结束计算结果相同.

Let's start off with compiler reordering. The C language rules are such that GCC is forbidden from reordering volatile loads and store memory accesses with respect to each other, or deleting them, when a sequence point occurs between them (Thanks to bobc for this clarification). That is to say, in the assembly output, those memory accesses will appear, and will be sequenced precisely in the order you specified. Non-volatile accesses, on the other hand, can be reordered with respect to all other accesses, volatile or not, provided that (by the as-if rule) the end result of the calculation is the same.

例如,C 代码中的非volatile 加载可以按照代码所述执行多次,但顺序不同(例如,如果编译器觉得这样做更方便)当更多寄存器可用时更早或更晚).它的执行次数可能比代码说的少(例如,如果值的副本碰巧在大表达式中间的寄存器中仍然可用).或者它甚至可以被删除(例如,如果编译器可以证明加载是无用的,或者它是否将一个变量完全移动到寄存器中).

For instance, a non-volatile load in the C code could be done as many times as the code says, but in a different order (e.g. If the compiler feels it's more convenient to do it earlier or later when more registers are available). It could be done fewer times than the code says (e.g. If a copy of the value happened to still be available in a register in the middle of a large expression). Or it could even be deleted (e.g. if the compiler can prove the uselessness of the load, or if it moved a variable entirely into a register).

为了防止在其他时候编译器重新排序,您必须使用特定于编译器的屏障.为此,GCC 使用 __asm__ __volatile__("":::"memory"); .

To prevent compiler reorderings at other times, you must use a compiler-specific barrier. GCC uses __asm__ __volatile__("":::"memory"); for this purpose.

这不同于CPU重新排序,也就是内存排序模型.古代 CPU 精确地按照它们在程序中出现的顺序执行指令;这称为程序排序,或强内存排序模型.然而,现代 CPU 有时会诉诸欺骗"以提高运行速度,方法是弱化内存模型.

This is different from CPU reordering, a.k.a. the memory-ordering model. Ancient CPUs executed instructions precisely in the order they appeared in the program; This is called program ordering, or the strong memory-ordering model. Modern CPUs, however, sometimes resort to "cheats" to run faster, by weakening a little the memory model.

英特尔的软件开发人员手册第 3 卷第 8 章第 8.2.2 节中记录了 x86 CPU 削弱内存模型的方式P6 和更新的处理器系列中的内存排序".部分内容是这样的:

The way x86 CPUs weaken the memory model is documented in Intel's Software Developer Manuals, Volume 3, Chapter 8, Section 8.2.2 "Memory Ordering in P6 and More Recent Processor Families". This is, in part, what it reads:

  • 读取不会与其他读取重新排序.
  • 写入不会与较旧的读取重新排序.
  • 写入内存不会与其他写入重新排序,[某些] 例外.
  • 读取可能会随着较旧的写入不同位置而重新排序,但不会与较旧的写入相同位置.
  • 无法使用 I/O 指令、锁定指令或序列化指令对读取或写入进行重新排序.
  • 读取不能通过更早的 LFENCE 和 MFENCE 指令.
  • 写入不能通过更早的 LFENCE、SFENCE 和 MFENCE 指令.
  • LFENCE 指令无法通过较早的读取.
  • SFENCE 指令无法通过较早的写入.
  • MFENCE 指令无法通过更早的读取或写入.

在第 8.2.3 节说明内存排序原则的示例"中,它还提供了非常好的示例,说明可以重新排序和不可以重新排序的内容.

It also gives very good examples of what can and cannot be reordered, in Section 8.2.3 "Examples Illustrating the Memory-Ordering Principles".

如您所见,使用 FENCE 指令来防止 x86 CPU 对内存访问进行不适当的重新排序.

As you can see, one uses FENCE instructions to prevent an x86 CPU from reordering memory accesses inappropriately.

最后,您可能对这个链接感兴趣,其中更详细地介绍了您想要的装配示例.

Lastly, you may be interested in this link, which goes into further detail and comes with the assembly examples you crave.

这篇关于GCC 对读/写指令的重新排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆