x86:这里需要内存屏障吗? [英] x86: Are memory barriers needed here?

查看:25
本文介绍了x86:这里需要内存屏障吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在WB内存中,a = b = 0

P1:
a = 1
SFENCE
b = 1

P2:
WHILE (b == 0) {}
LFENCE
ASSERT (a == 0)

据我了解,这里不需要 SFENCELFENCE.

It is my understanding, that neither the SFENCE or LFENCE are needed here.

也就是说,对于这种内存类型,x86 确保:

Namely, since, for this memory type, x86 ensures:

  1. 读取不能与旧读取重新排序
  2. 商店不能与旧商店重新订购
  3. 商店可传递可见

推荐答案

lfencesfence asm 指令是无操作的,除非您使用 NT 存储(或NT 从 WC 内存加载,例如视频 RAM).(实际上,movntdqa 加载可能只能由纸上的 mfence 排序,而不是 lfence.在这种情况下,我不知道你什么时候会使用 lfence.在 movntdqa 之前,它与 sfence + mfence 一起添加到 ISA,同时作为 NT 存储,可能只是为了完整性/以防万一.)

The lfence and sfence asm instructions are no-ops unless you're using NT stores (or NT loads from WC memory, e.g. video RAM). (Actually, movntdqa loads might only be ordered by mfence on paper, not lfence. In which case I don't know when you'd ever use lfence. It was added to the ISA along with sfence + mfence at the same time as NT stores, before movntdqa, possibly just for completeness / in case it was ever needed.)

有时会混淆这一点,因为lfencesfence 的 C/C++ 内在函数也是编译器障碍. 在 C/C++ 中是需要的,但可以用 GNU C asm("":::"memory"); 或(订购轻松-<代码>原子操作1) std::atomic_signal_fence(std::memory_order_acq_rel).限制编译时重新排序而无需使编译器发出任何无用的 asm 屏障指令.

There is sometimes confusion around this point, because the C/C++ intrinsics for lfence and sfence are also compiler barriers. That is needed in C/C++, but can be had more cheaply with GNU C asm("":::"memory"); or (to order relaxed-atomic operations1) std::atomic_signal_fence(std::memory_order_acq_rel). Restricts compile-time reordering without making the compiler emit any useless asm barrier instructions.

运行时重新排序已被 x86 内存模型阻止,除了 StoreLoad 重新排序,这需要 mfence 来阻止.lfence + sfence 不等于 mfence.请参阅是否有任何意义指令LFENCE在 x86/x86_64 处理器中? 以及其他各种 SO Q&关于这些指令.

Run-time reordering is already blocked by the x86 memory model, except for StoreLoad reordering which requires mfence to block. lfence + sfence don't add up to mfence. See Does it make any sense instruction LFENCE in processors x86/x86_64? and various other SO Q&As about these instructions.

这就是为什么 std::atomic_thread_fence(std::memory_order_acq_rel) 在 x86 上也编译为零指令,但在弱有序架构上编译为障碍.

This is why std::atomic_thread_fence(std::memory_order_acq_rel) also compiles to zero instructions on x86, but to barriers on weakly-ordered architectures.

lfence 也是 Intel 微架构(但可能不是 AMD?)的序列化指令.一直以来都是如此,但英特尔最近正式做出了这一保证,因此 Spectre 缓解技术可以安全地使用它,而不是更加不方便的 cpuid.

lfence is also a serializing instruction on Intel microarchitectures (but maybe not AMD?). It has been all along, but Intel recently made this guarantee official so Spectre mitigation techniques could safely use it instead of a much more inconvenient cpuid.

  • 脚注 1:

atomic_signal_fence 也可能是纯非atomic 变量的编译器障碍;这是我最后一次检查 gcc(而 atomic_thread_fence 不是),但这可能只是一个实现细节,当不涉及任何 atomic 变量时.当有 atomic 变量时,编译器知道这些变量可能会提供排序,让其他线程在没有 UB 的情况下访问非原子变量,因此需要排序.

atomic_signal_fence on gcc may also be a compiler barrier for plain non-atomic variables; it was last time I checked with gcc (while atomic_thread_fence wasn't), but this is probably just an implementation detail when there aren't any atomic variables involved. When there are atomic variables, the compiler knows that those variables may provide ordering that lets other threads access non-atomic variables without UB, so ordering is needed.

这篇关于x86:这里需要内存屏障吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆