假设没有非临时性指令,"xchg"是否包含"mfence"? [英] Does `xchg` encompass `mfence` assuming no non-temporal instructions?

查看:100
本文介绍了假设没有非临时性指令,"xchg"是否包含"mfence"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看到此答案

I have already seen this answer and this answer, but neither appears to clear and explicit about the equivalence or non-equivalence of mfence and xchg under the assumption of no non-temporal instructions.

英特尔说明参考对于xchg提到此指令对于实现信号量或类似的数据结构以进行进程同步很有用,并进一步参考了

The Intel instruction reference for xchg mentions that this instruction is useful for implementing semaphores or similar data structures for process synchronization, and further references Chapter 8 of Volume 3A. That reference states the following.

对于P6系列处理器,锁定操作会序列化所有 出色的加载和存储操作(即等待它们执行 完全的).对于Pentium 4和Intel Xeon也适用此规则 处理器,只有一个例外.加载弱引用的操作 有序内存类型(例如WC内存类型)可能不是 序列化.

For the P6 family processors, locked operations serialize all outstanding load and store operations (that is, wait for them to complete). This rule is also true for the Pentium 4 and Intel Xeon processors, with one exception. Load operations that reference weakly ordered memory types (such as the WC memory type) may not be serialized.

mfence文档声称以下内容.

对所有从内存加载"和 MFENCE之前发布的存储到内存指令 操作说明.此序列化操作可确保每个负载和 将MFENCE指令之前的指令存储在程序中 在任何加载或存储指令之前,订单变得全局可见 遵循MFENCE指令. 1 MFENCE指令是 关于所有装载和存储指令的订购,其他MFENCE 指令,任何LFENCE和SFENCE指令以及任何序列化 指令(例如CPUID指令). MFENCE不 序列化指令流.

Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior the MFENCE instruction. This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction. 1 The MFENCE instruction is ordered with respect to all load and store instructions, other MFENCE instructions, any LFENCE and SFENCE instructions, and any serializing instructions (such as the CPUID instruction). MFENCE does not serialize the instruction stream.

如果我们忽略了弱排序的内存类型, xchg(这意味着lock)是否包含mfence关于内存排序的所有保证?

If we ignore weakly ordered memory types, does xchg (which implies lock) encompass all of mfence's guarantees with respect to memory ordering?

推荐答案

假设您不是在编写设备驱动程序(因此所有内存均为回写式,而不是弱排序的Write-组合),则xchgmfence一样强.

Assuming you're not writing a device-driver (so all the memory is Write-Back, not weakly-ordered Write-Combining), then yes xchg is as strong as mfence.

NT商店很好.

我确定当前的硬件就是这种情况,并且可以肯定,所有将来的x86 CPU的手册中的措辞都可以保证这一点. xchg是一个非常强大的完整内存屏障.

I'm sure that this is the case on current hardware, and fairly sure that this is guaranteed by the wording in the manuals for all future x86 CPUs. xchg is a very strong full memory barrier.

嗯,我没看过预取指令的重新排序.这可能与性能有关,甚至在怪异的设备驱动程序情况下(甚至在您本不应该使用可缓存内存的情况下)也可能与正确性有关.

Hmm, I haven't looked at prefetch instruction reordering. That might possibly be relevant for performance, or possibly even correctness in weird device-driver situations (where you're using cacheable memory when you probably shouldn't be).

根据您的报价:

(P4/Xeon)引用弱排序的内存类型(例如WC内存类型)的加载操作可能无法序列化.

(P4/Xeon) Load operations that reference weakly ordered memory types (such as the WC memory type) may not be serialized.

那是使xchg [mem]弱于mfence的一件事(在Pentium4上还是在Sandybridge系列上).

That's the one thing that makes xchg [mem] weaker then mfence (on Pentium4? Probably also on Sandybridge-family).

mfence 确实保证了这一点,这就是为什么Skylake必须加强它以解决错误. (加载并存储唯一的指令,还有您在

mfence does guarantee that, which is why Skylake had to strengthen it to fix an erratum. (Are loads and stores the only instructions that gets reordered?, and also the answer you linked on Does lock xchg have the same behavior as mfence?)

NT存储区由xchg/lock序列化,只有弱序列化的负载可能不会被序列化. 您不能从WB内存中进行弱排序的加载. WB存储器上的movntdqa xmm, [mem]仍然是有序的(在当前的实现中,它也忽略了NT提示,没有采取任何措施来减少缓存污染).

NT stores are serialized by xchg / lock, it's only weakly-ordered loads that may not be serialized. You can't do weakly-ordered loads from WB memory. movntdqa xmm, [mem] on WB memory is still strongly-ordered (and on current implementations, also ignores the NT hint instead of doing anything to reduce cache pollution).

在当前CPU上,xchg对于seq-cst存储的性能似乎要优于mov + mfence,因此应在常规代码中使用它. (您不会意外地映射WC内存;正常的操作系统始终会为您提供WB内存以进行正常分配.WC仅用于视频RAM或其他设备内存.)

It looks like xchg performs better for seq-cst stores than mov+mfence on current CPUs, so you should use that in normal code. (You can't accidentally map WC memory; normal OSes will always give you WB memory for normal allocations. WC is only used for video RAM or other device memory.)

这些保证是根据特定的英特尔微体系结构系列指定的.如果我们可以为将来的Intel和AMD CPU承担一些通用的基准x86"保证,那就太好了.

These guarantees are specified in terms of specific families of Intel microarchitectures. It would be nice if there was some common "baseline x86" guarantees that we could assume for future Intel and AMD CPUs.

我假设但尚未检​​查AMD上xchgmfence的情况是否相同.我确信使用xchg作为seq-cst存储不会有正确性问题,因为那是gcc以外的其他编译器实际所做的事情.

I assume but haven't checked that the xchg vs. mfence situation is the same on AMD. I'm sure there's no correctness problem with using xchg as a seq-cst store, because that's what compilers other than gcc actually do.

这篇关于假设没有非临时性指令,"xchg"是否包含"mfence"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆