请问英特尔内存模型使SFENCE和LFENCE多余的? [英] Does the Intel Memory Model make SFENCE and LFENCE redundant?

查看:191
本文介绍了请问英特尔内存模型使SFENCE和LFENCE多余的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

英特尔内存模型保证:


  • 店铺不会重新排序与其他存储

  • 负载不会重新排序和其它负载

<一个href=\"http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/\">http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/

我见过声称SFENCE是x86-64的冗余,由于英特尔的内存模型,但从来没有LFENCE。做好以上的内存模型规则做出任何说明冗余?

I have seen claims that SFENCE is redundant on x86-64 due to the Intel memory model, but never LFENCE. Do the above memory model rules make either instructions redundant?

推荐答案

是的,SFENCE和LFENCE是空操作,使用 movnt (非暂时)流时除外店,或具有内存区域的工作与类型设定为比正常写回以外的内容。 NT存储绕过缓存,以及被责令较弱(但 86的正常的内存模型强烈下令(除见下面的快速串OPS))。 (NT负载( movntdqa )从WB内存是<一个href=\"http://stackoverflow.com/questions/35516878/acquire-release-semantics-with-non-temporal-stores-on-x64/35571387#35571387\">still严格排序,从弱序内存读取时),所以LFENCE才有用。这并不在正常的节目发生意外,所以你只需要担心这个,如果你的mmap视频RAM或东西。

Yes, SFENCE and LFENCE are no-ops, except when using movnt (Non-Temporal) streaming stores, or working with memory regions with a type set to something other than the normal Write-Back. NT stores bypass the cache as well as being weakly ordered (but x86's normal memory model is strongly ordered (except see below for Fast-String ops)). (NT loads (movntdqa) from WB memory are still strongly ordered, so LFENCE is only useful when reading from weakly-ordered memory). This doesn't happen by accident in "normal" programs, so you only have to worry about this if you mmap video RAM or something.

这帖子:记忆重新排序当场抓获是同一案件有关,巴尔托什的会谈后,你需要一个StoreLoad屏障像MFENCE的更容易阅读的描述。

This post: Memory Reordering Caught in the Act is an easier-to-read description of the same case Bartosz's post talks about, where you need a StoreLoad barrier like MFENCE.

如果你有阅读它,你发布的链接后仍有问题,请阅读杰夫preshing的博客文章。他们给了我这个问题有很好的理解。 :)虽然我想我发现了大约SFENCE / LFENCE通常是一个空操作在Doug Lea的页面的珍闻。 Jeff的帖子没有考虑NT加载/存储。

If you had questions after reading it the link you posted, read Jeff Preshing's blog posts. They gave me a good understanding of the subject. :) Although I think I found the tidbit about SFENCE/LFENCE normally being a no-op in Doug Lea's page. Jeff's posts didn't consider NT loads/stores.

我好奇这几个星期前,并张贴了相当详细的答案,最近的问题:
<一href=\"http://stackoverflow.com/questions/32384901/atomic-operations-stdatomic-and-ordering-of-writes/32394427#32394427\">Atomic操作的std ::原子&LT;&GT;和写入的顺序的。我包括很多链接到那些关于C ++的内存模型与硬件内存模型。

I got curious about this a couple weeks ago, and posted a fairly detailed answer to a recent question: Atomic operations, std::atomic<> and ordering of writes. I included lots of links to stuff about the memory model of C++ vs. hardware memory models.

如果你用C ++编写,使用的std ::原子&LT;&GT; 是告诉编译器什么顺序要求你有,所以它不'一个很好的方式ŧ在编译时重新排列你的内存操作。你可以和应该使用较弱的释放或收购语义在适当情况下,而不是默认的顺序一致性,所以编译器并没有在所有发出的任何的x86指令屏障。它只是保持OPS源秩序。

If you're writing in C++, using std::atomic<> is an excellent way to tell the compiler what ordering requirements you have, so it doesn't reorder your memory operations at compile time. You can and should use weaker release or acquire semantics where appropriate, instead of the default sequential consistency, so the compiler doesn't have to emit any barrier instructions at all on x86. It just has to keep the ops in source order.

在如ARM或PPC,或x86弱有序的架构movnt,你需要写一个缓冲区,设置标志,指​​示数据准备好之间的StoreStore屏障指令。此外,读者需要检查标志,并读取缓冲区之间的LoadLoad屏障指令。

On a weakly ordered architecture like ARM or PPC, or x86 with movnt, you need a StoreStore barrier instruction between writing a buffer and setting a flag to indicate the data is ready. Also, the reader needs a LoadLoad barrier instruction between checking the flag and reading the buffer.

不算movnt,每家商店之间StoreStore壁垒每个负载之间的x86已经有了LoadLoad障碍,。 (LoadStore排序也有保证)。 MFENCE 是所有4种障碍,包括StoreLoad,这是唯一的障碍x86的默认不这样做。 MFENCE确保负载不会从之前其他线程看到你的店,并可能是靠自己的商店使用旧prefetched值。 (除了作为NT的存储顺序和负载顺序的一个障碍。)

Not counting movnt, x86 already has LoadLoad barriers between every load, and StoreStore barriers between every store. (LoadStore ordering is also guaranteed). MFENCE is all 4 kinds of barriers, including StoreLoad, which is the only barrier x86 doesn't do by default. MFENCE makes sure loads don't use old prefetched values from before other threads saw your stores and potentially did stores of their own. (As well as being a barrier for NT store ordering and load ordering.)

有趣的事实:86 锁定 - prefixed说明也充满记忆的障碍。它们可以被用来作为老32位code为MFENCE的替代品,可能在CPU上运行,不支持它。 锁定添加[ESP],0 否则无操作,并执行读取/修改/写周期上的内存,在L1缓存很可能热,已在M在MESI一致性协议的状态。

Fun fact: x86 lock-prefixed instructions are also full memory barriers. They can be used as a substitute for MFENCE in old 32bit code that might run on CPUs not supporting it. lock add [esp], 0 is otherwise a no-op, and does the read/modify/write cycle on memory that's very likely hot in L1 cache and already in the M state of the MESI coherency protocol.

SFENCE是StoreStore障碍。

SFENCE is a StoreStore barrier.

LFENCE是LoadLoad和也LoadStore屏障。 ( loadNT / LFENCE / storeNT $ P $从负载之前成为全局可见pvents的商店。我想,这在实践中可能发生的,如果加载地址是一个长期的结果依赖关系链,或者在高速缓存错过另一个负荷的结果。)

LFENCE is a LoadLoad and also a LoadStore barrier. (loadNT / LFENCE / storeNT prevents the store from becoming globally visible before the load. I think this could happen in practice if the load address was the result of a long dependency chain, or the result of another load that missed in cache.)

有趣的事实#2(感谢 @EOF ):从快速字符串OPS(该商店代表STOSB / 代表MOVSB​​ 的IvyBridge的和更高版本)是弱排序(但不缓存-旁路)。

Fun fact #2 (thanks @EOF): The stores from Fast-String Ops (rep stosb / rep movsb on IvyBridge and later) are weakly-ordered (but not cache-bypassing).

英特尔文档快速串行动可能会出现执行乱序在他们的软件开发人员手册,vol1中第7.3.9.3的事实。他们还表示,

Intel documents the fact that Fast-String Ops "may appear to execute out of order" in section 7.3.9.3 of their Software Developers Manual, vol1. They also say

订单依赖code应该写为离散的信号变量
  之后的任何字符串操作以允许正确排序的数据可以看出
  所有处理器

"Order-dependent code should write to a discrete semaphore variable after any string operations to allow correctly ordered data to be seen by all processors"

他们没有提及任何障碍的指示。我读的方式,有代表STOSB / REP MOVSB​​ 后隐式SFENCE(至少为字符串数据围栏,可能没有其他飞行弱有序NT)。无论如何,措辞暗示的标志/信号的写入变得全局可见的之后的所有字符串的举动写入,因此需要在code,它填补了缓冲带快速无SFENCE / LFENCE -string运算,然后写入一个标志,或code读取它。 (LoadLoad排序总是会发生,所以你总能看到数据的顺序来其它CPU使得全局可见。即采用弱有序店写入缓冲区不会改变的事实,其他线程负载仍然强烈排序。)

They don't mention any barrier instructions. The way I read it, there's an implicit SFENCE after rep stosb / rep movsb (at least a fence for the string data, probably not other in-flight weakly ordered NT). Anyway, the wording implies that a write to the flag / semaphore becomes globally visible after all the string-move writes, so no SFENCE / LFENCE is needed in code that fills a buffer with a fast-string op and then writes a flag, or in code that reads it. (LoadLoad ordering always happens, so you always see data in the order that other CPUs made it globally visible. i.e. using weakly-ordered stores to write a buffer doesn't change the fact that loads in other threads are still strongly ordered.)

总结:使用普通店铺写一个标志,表明缓冲区已准备就绪。 不要有读者只是检查用的memset / memcpy的写入块的最后一个字节。不过,我认为快速串店prevent从通过他们的任何稍后存储,所以你还是如果你使用只需要SFENCE / LFENCE movNT

summary: use a normal store to write a flag indicating that a buffer is ready. Don't have readers just check the last byte of the block written with memset/memcpy. However, I think fast-string stores prevent any later stores from passing them, so you still only need SFENCE/LFENCE if you're using movNT.

有这么写道:数据准备好标志作为一个代表STOSB的一部分CPU MSR位可以被清零以禁用快速串OPS,对于需要运行旧的二进制新服务器的好处代表MOVSB​​

There's a CPU MSR bit that can be cleared to disable fast string ops, for the benefit of new servers that need to run old binaries that writes a "data ready" flag as part of a rep stosb or rep movsb.

这篇关于请问英特尔内存模型使SFENCE和LFENCE多余的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆