有人可以提供如何“完全栅栏”在.NET中使用Threading.MemoryBarrier实现一个简单的解释? [英] Can someone provide an easy explanation of how 'Full Fences' are implemented in .Net using Threading.MemoryBarrier?

查看:284
本文介绍了有人可以提供如何“完全栅栏”在.NET中使用Threading.MemoryBarrier实现一个简单的解释?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我清楚MemoryBarrier的使用,而不是在背后运行时的场景会发生什么。谁能给了发生在一个很好的解释?

I'm clear on the usage of MemoryBarrier, but not on what happens behind the scenes in the runtime. Can anyone give a good explanation of what goes on?

推荐答案

在一个真正强大的内存模型,发光护栏说明是不必要的。所有的内存访问将顺序执行,所有商店将是全局可见。

In a really strong memory model, emitting fence instructions would be unnecessary. All memory accesses would execute in order and all stores would be globally visible.

内存栅栏,因为目前的通用架构不提供强大的内存模型 - 86 / 64例如可以重新排序读相对于写。 (一更彻底的来源是的 英特尔®64和IA-32架构软件开发者手册,8.2.2内存排序在P6和最近的处理器系列的)。由于从gazillions一个例子,德克尔的算法将无法在x86 / 64 没有围墙。

Memory fences are needed because current common architectures do not provide a strong memory model - x86/x64 can for example reorder reads relative to writes. (A more thorough source is "Intel® 64 and IA-32 Architectures Software Developer’s Manual, 8.2.2 Memory Ordering in P6 and More Recent Processor Families"). As an example from the gazillions, Dekker's algorithm will fail on x86/x64 without fences.

即使JIT产生机器code,式中与存储器加载和存储指令被小心地放置,其努力是无用的,如果在CPU然后重新排列这些加载和存储 - 的顺序,它也可以,只要是错觉保持一致性的电流的上下文/线程。

Even if the JIT produces machine code in which instructions with memory loads and stores are carefully placed, its efforts are useless if the CPU then reorders these loads and stores - which it can, as long as the illusion of sequential consistency is maintained for the current context/thread.

冒着过于简单化了:它可以帮助以可视化的加载和存储的指令流为野生动物的一群如雷产生。 当他们穿过一个狭窄的桥(你的CPU),你永远无法确定对动物的顺序,因为他们中的一些会慢一些,有的快,有的反超,有些落后。 如果在启动 - 当你发出机器code - 您可以通过把它们之间的无限长的围栏划分成组,你至少可以肯定的是A组来自B组前

Risking oversimplification: it may help to visualize the loads and stores resulting from the instruction stream as a thundering herd of wild animals. As they cross a narrow bridge (your CPU), you can never be sure about the order of the animals, since some of them will be slower, some faster, some overtake, some fall behind. If at the start - when you emit the machine code - you partition them into groups by putting infinitely long fences between them, you can at least be sure that group A comes before group B.

栅栏确保订购的读取和写入。用词不准确,但:

Fences ensure the ordering of reads and writes. Wording is not exact, but:

  • 在一家商店篱笆等待所有未存储(写)操作来完成,但不影响负荷。
  • 负载篱笆等待的所有悬而未决的负载(读取)操作来完成,但不影响门店。
  • 在一个完整的篱笆等待的所有存储和加载操作完成。它具有读取和写入前的栅栏将在写入和负载是在栅栏的另一边之前得到执行的效果(晚一点比围栏)。

什么是JIT发出一个完整的围墙,依赖于(CPU)架构,什么内存排序保证其提供。 由于JIT确切地知道它的架构上运行,就可以发出正确的指令(S)。

What the JIT emits for a full fence, depends on the (CPU) architecture and what memory ordering guarantees it provides. Since the JIT knows exactly what architecture it runs on, it can issue the proper instruction(s).

在我的x64机器,使用.NET 4.0 RC,它正好是一个锁或

On my x64 machine, with .NET 4.0 RC, it happens to be a lock or.

            int a = 0;
00000000  sub         rsp,28h 
            Thread.MemoryBarrier();
00000004  lock or     dword ptr [rsp],0 
            Console.WriteLine(a);
00000009  mov         ecx,1 
0000000e  call        FFFFFFFFEFB45AB0 
00000013  nop 
00000014  add         rsp,28h 
00000018  ret 

英特尔®64和IA-32架构软件开发人员手册章8.1.2:

  • ......锁定操作序列化所有优秀的加载和存储操作(即,等待他们完成)。的 ...的锁定操作都是原子相对于所有其他存储器操作和所有 外部可见的事件。只有取指令和页表的访问可以通过 锁定指令。锁定指令可用于同步写的数据 一个处理器和由另一处理器读

  • "...locked operations serialize all outstanding load and store operations (that is, wait for them to complete)." ..."Locked operations are atomic with respect to all other memory operations and all externally visible events. Only instruction fetch and page table accesses can pass locked instructions. Locked instructions can be used to synchronize data written by one processor and read by another processor."

内存排序的说明解决这一特定需要。 MFENCE 可以被用来作为完全阻隔在上述情况下(至少在理论上 - 一,的锁定操作可能会更快,两个它的可能会导致不同的行为)。 MFENCE 和它的朋友们可以在第一章8.2.5加强或削弱内存订货模型中找到的。

memory-ordering instructions address this specific need. MFENCE could have been used as full barrier in the above case (at least in theory - for one, locked operations might be faster, for two it might result in different behavior). MFENCE and its friends can be found in Chapter 8.2.5 "Strengthening or Weakening the Memory-Ordering Model".

有一些更多的方式来序列化存储和加载,但他们要么不切实际或慢于上述方法:

There are some more ways to serialize stores and loads, though they are either impractical or slower than the above methods:

  • 在8.3节你可以找到完整的串行指令的如 CPUID 。这些序列化指令流,以及:没有任何东西可以通过一个串行指令 序列化指令无法通过任何其他指令(读,写,指令 取,或I / O)。

  • In chapter 8.3 you can find full serializing instructions like CPUID. These serialize instruction flow as well: "Nothing can pass a serializing instruction and a serializing instruction cannot pass any other instruction (read, write, instruction fetch, or I/O)".

如果您设置内存作为强大的非高速缓存(UC)的它会给你一个强大的内存模型:没有投机或乱序访问将被允许,所有的访问将出现在总线上,因此,没有必要发出的指令。 :)当然,这将是一个稍微比平时慢。

If you set up memory as strong uncached (UC), it will give you a strong memory model: no speculative or out-of order accesses will be allowed and all accesses will appear on the bus, therefore no need to emit an instruction. :) Of course, this will be a tad slower than usual.

...

所以,这取决于。如果有一台电脑具有很强的排序保证,在JIT可能会发出什么。

So it depends on. If there was a computer with strong ordering guarantees, the JIT would probably emit nothing.

IA64等体系结构有自己的内存模型 - 从而保证内存排序(或缺乏他们) - 和自己的指令/方法来处理内存存储/加载顺序

IA64 and other architectures have their own memory models - and thus guarantees of memory ordering (or lack of them) - and their own instructions/ways to deal with memory store/load ordering.

这篇关于有人可以提供如何“完全栅栏”在.NET中使用Threading.MemoryBarrier实现一个简单的解释?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆