编译器重新排序与内存重新排序 [英] compiler reordering vs memory reordering

查看:135
本文介绍了编译器重新排序与内存重新排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在gcc下,有以下说明可用于设置内存屏障.它们都提供不同的保护"

Under gcc there are the followings instructions available for set a memory barrier. They both provide different "protection"

asm volatile("" ::: "memory"); // compiler reorder
asm volatile("mfence" ::: "memory"); // memory reordering

C ++原子提供简短的内容:

C++ atomic provide in short :

- acquire/release semantics
- Sequentially-consistent ordering

我想知道gcc原语和C ++原子语义之间是否存在直接映射? (例如(必须是错误的,仅出于解释目的),获取/释放语义是为了防止编译器重新排序,而顺序一致的顺序是为了防止内存重新排序)

I'm wondering if there is a direct mapping between gcc primitive and C++ atomic semantics ? (for instance (that must be wrong,it's just for explanation purpose) , acquire/release semantics is to prevent against compiler reordering and Sequentially-consistent ordering is to prevent memory reordering)

也许C ++不会产生这种差异?该语言仅提供同时适用于两种重新排序的语义吗?

Or maybe C++ doesn't do this difference ? the language offer only semantics which apply to both reordering in the same time ?

推荐答案

第一个障碍仅在编译期间适用.一旦编译器完成,就不会有任何影响,因为什么都没有添加到代码中.这对于避免某些内存排序问题可能是有用的(编译器不知道其他线程如何处理这些内存位置,尽管几乎没有任何具有常规设置的编译器敢于对具有此可能性的变量进行重新排序).

The first barrier only applies during compilation. Once the compiler is done, it has no impact since nothing is added to the code. This could be useful to avoid some memory ordering issues (the compiler doesn't know how other threads may manipulate these memory locations, although hardly any compiler with normal settings would dare reorder variables with a potential for that).

但是,这还远远不够,因为在现代乱序的CPU上,硬件本身可能会在后台对操作进行重新排序.为避免这种情况,给定您要实现的限制的确切级别和形式,您可以通过各种方法告诉硬件制造商(顺序一致性是最严格和安全"的订购模型,但通常也是最昂贵的订购模型)性能).

However, this is far from enough since on modern out-of-order CPUs the hardware itself may reorder operations under the hood. To avoid that, you have ways to tell the HW to watch out, given the exact level and form of restriction you want to achieve (with sequential consistency being the most restrictive and "safe" ordering model, but usually also the most expensive in terms of performance).

要实现这些限制,您可以尝试手动维护ISA提供的障碍和类似构造(通过内部函数,内联汇编,序列化操作或任何其他技巧).即使您知道自己在做什么,也通常会很复杂,甚至可能是微体系结构特定的(某些CPU可能免费"授予某些限制,从而使显式的防护无效),因此c ++ 11将原子语义添加到了使此任务更加容易,现在,编译器将根据您所需的指定订购模型为您添加必要的代码.

To achieve these restrictions, you can either try manually maintaining barriers and similar constructs that the ISA provides (through intrinsics, inline assembly, serializing operations, or any other trick). This is usually complicated even if you know what you're doing, and may even be micro-architectural specific (some CPUs may grant some restrictions "for free", making explicit fencing useless), so c++11 added the atomic semantics to make this task easier, and now the compiler adds the necessary code for you depending on the specified ordering model you want.

在您的示例中,mfence是手动执行操作的示例,但是您还需要知道将其应用于何处.正确使用mfence可以足够字符串以提供seq一致性,但是它也非常昂贵,因为它包含存储栅栏(mfence = sfence + lfence),这需要从内部缓冲区中清空所有挂起的存储区,这是一个缓慢的操作,因为已经完成了缓冲以允许它们延迟提交. 另一方面,如果要获取/释放语义,则可以考虑体系结构,选择在正确的位置使用适当的局部围栅来实现它们,或者让编译器为您完成.例如,如果选择后者并在x86机器上运行,您会发现大多数情况下不需要添加任何内容,因为存储具有隐式的发布语义,而负载具有获取的语义,但是在其他体系结构上可能并不适用.

In your example, the mfence is an example of doing things manually, but you also need to know where to apply it. Used correctly, the mfence can be string enough to provide seq consistency, but is also very expensive since it includes a store-fence (mfence = sfence + lfence), which requires draining all pending stores from the internal buffers, a slow operation since the buffering is done to allow them a lazy commit. On the other hand, if you want acquire/release semantics, you can chose to implement them with proper partial fences at the correct places considering your architecture, or let the compiler do that for you. If you choose the latter and run over an x86 machine for example, you'll discover that most of the times nothing needs to be added since stores have implicit release semantics and loads have acquire semantics, but the same may not apply on other architectures.

以下是每个体系结构各种排序语义的实现的一个不错的摘要- http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

Here's a nice summary of the implementation of various ordering semantics per architecture - http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

这篇关于编译器重新排序与内存重新排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆