x86内存排序:使用早期存储与内部处理器转发重新排序的负载 [英] x86 memory ordering: Loads Reordered with Earlier Stores vs. Intra-Processor Forwarding

查看:334
本文介绍了x86内存排序:使用早期存储与内部处理器转发重新排序的负载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解英特尔系统编程指南第8.2节,

I am trying to understand section 8.2 of Intel's System Programming Guide (that's Vol 3 in the PDF).

特别是,我看到两种不同的重新排序方案:

In particular, I see two different reordering scenarios:

8.2.3.4可以将货物重新排序到不同地点的早期商店

8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations

允许处理器转发

但是,我不明白这些情况与可观察效果POW之间的区别。这些章节中提供的例子似乎可以与我互换。 8.2.3.4示例可以通过8.2.3.5规则以及自己的规则来解释。

However, I do not understand the difference between these scenarios from the observable effects POW. The examples provided in those sections seem interchangeable to me. 8.2.3.4 example can be explained by 8.2.3.5 rule just as well as by its own rule. And the converse seems true to me as well, although I am not that sure in that case.

这里是我的问题:有更好的例子或解释如何可观察的8.2.3.4的效果与8.2.3.5的可观察效果不同?

So here is my question: are there better examples or explanations how the observable effects of 8.2.3.4 are different from observable effects of 8.2.3.5?

推荐答案

8.2 .3.5 应该是令人惊讶,如果你期望内存排序都是严格干净,即使你承认 8.2.3.4 允许负载以重新排序不同地址的商店。

The example at 8.2.3.5 should be "surprising" if you expect memory ordering to be all strict an clean, and even if you acknowledge that 8.2.3.4 allows loads to reorder with stores of different addresses.

   Processor 0      |      Processor 1
  --------------------------------------
   mov [x],1        |      mov [y],1
   mov R1, [x]      |      mov R3,[y]
   mov R2, [y]      |      mov R4,[x]

请注意,关键部分是新加入的中间return 1 (存储到加载转发使得可能在uarch不停顿)。因此,在理论上,你会期望两个存储在这两个加载完成时被全局观察到(在连续一致性的情况下,存储和所有内核之间存在唯一的顺序)。

Note that the key part is that the newly added loads in the middle both return 1 (store-to-load forwarding makes that possible in the uarch without stalling). So in theory, you would expect that both stores have been "observed" globally by the time both these loads completed (that would have been the case with sequential consistency, where there is a unique ordering between stores and all cores see it).

但是,稍后 R2 = R4 = 0 作为有效结果证明不是这样 - 事实上首先在本地观察。换句话说,允许这个结果意味着处理器0看到存储为 time(x)时间(y),而处理器1看到相反的结果。

However, having later R2 = R4 = 0 as a valid outcome proves this is not the case - the stores are in fact observed locally first. In other words, allowing this outcome means that processor 0 sees the stores as time(x) < time(y), while processor 1 sees the opposite.

这是对这个内存模型的一致性的一个非常重要的观察,前面的例子不能证明。这种差异是顺序一致性总商店订购 - 第二个示例断开SC,第一个示例不符合。

This is a very important observation about the consistency of this memory model, which the previous example doesn't prove. This nuance is the biggest difference between Sequential Consistency and Total Store Ordering - the second example breaks SC, the first one doesn't.

这篇关于x86内存排序:使用早期存储与内部处理器转发重新排序的负载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆