分析 volatile 上下文中 JIT 生成的 x86 输出 [英] Analyzing of x86 output generated by JIT in the context of volatile

查看:16
本文介绍了分析 volatile 上下文中 JIT 生成的 x86 输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写这篇文章是为了深入理解Java中的volatile

公共类 Main {私人int x;私人易失性int g;公共无效actor1(){x = 1;g = 1;}公共无效actor2(){put_on_screen_without_sync(g);put_on_screen_without_sync(x);}}

现在,我正在分析 JIT 为上述代码生成的内容.根据我们在上一篇文章中的讨论,我们知道输出 1, 0 是不可能的,因为:

<小时>

写入 volatile v 会导致 a 之前的每个动作 v 会导致 a 可见(将在 v 可见之前刷新到内存).

<小时>

 ..................(我删除了不重要的方法体).....0x00007f42307d9d5e: c7460c01000000 (1) mov dword ptr [rsi+0ch],1h;* 输入字段 x;- package.Main::actor1@2(第 14 行)0x00007f42307d9d65: bf01000000 (2) mov edi,1h0x00007f42307d9d6a: 897e10 (3) mov dword ptr [rsi+10h],edi0x00007f42307d9d6d: f083042400 (4) lock add dword ptr [rsp],0h;* 输入字段 g;- package.Main::actor1@7(第 15 行)0x00007f42307d9d72: 4883c430 添加 rsp,30h0x00007f42307d9d76: 5d 流行 rbp0x00007f42307d9d77: 850583535116 测试 dword ptr [7f4246cef100h],eax;{poll_return}0x00007f42307d9d7d: c3 ret

我是否正确理解它可以工作,因为 x86 无法使 StoreStore 重新排序?如果可以的话,它需要额外的内存屏障,是吗?

<小时>

在@Eugene 的优秀回答后

<块引用>

 int tmp = i;//可变负载//[加载存储]//[加载加载]

在这里,我明白你的意思了——很清楚:每个动作(之后) volatile read (int tmp = i) 不会被重新排序.

<块引用>

//[StoreLoad] -- 这个int tmp = 我;//可变负载//[加载存储]//[加载加载]

在这里,您又设置了一个障碍.它确保我们不会使用 int tmp = i 重新排序任何操作.但是,为什么它很重要?为什么我有疑问?据我所知 volatile load 保证:

在可变加载可见之前, 可变加载之后的每个操作都不会重新排序.

我看到你写了:

<块引用>

需要有顺序一致性

但是,我不明白为什么需要顺序一致性.

解决方案

有几件事,首先 将被刷新到内存中 - 这是非常错误的.它几乎从不刷新到主内存 - 它通常将 StoreBuffer 排空到 L1 并且由缓存一致性协议在所有缓存之间同步数据,但是如果它更容易让您从这些术语中理解这个概念,这很好 - 只要知道它略有不同并且更快.

这是一个很好的问题,为什么 [StoreLoad] 确实存在,也许这会澄清一些事情.volatile 确实是关于栅栏的,这里是一个例子,说明在一些 volatile 操作的情况下会插入什么障碍.例如我们有一个volatile load:

//i 是一些共享的 volatile 字段int tmp = 我;//"i" 的可变负载//[加载加载|加载存储]

注意这里的两个障碍 LoadStoreLoadLoad;用简单的英语来说,这意味着在 volatile load/read 之后的任何 LoadStore 都不能向上移动"障碍,它们可以不能在该易失性负载之上"重新排序.

这是 volatile store 的示例.

//"i" 是一个共享的 volatile 变量//[存储存储|加载存储]我=tmp;//易失性存储

这意味着任何 LoadStore 都不能低于"加载存储本身.

这基本上建立了happens-before关系,volatile loadacquiring loadvolatile storerelease store(这也与 StoreLoad cpu 缓冲区的实现方式有关,但这几乎超出了问题的范围).

如果你仔细想想,我们对 volatile 的一般了解是非常有意义的;它表示一旦 volatile 负载观察到 volatile 存储 volatile 存储 之前的所有内容也将被观察到,这与内存屏障相当.现在有道理了,当发生 volatile 存储时,它上面的所有内容都不能超过它,而一旦发生 volatile 加载,它下面的所有内容都不能超过它,否则 this happens-before 将被破坏.

但是不仅如此,还有更多.需要顺序一致性,这就是为什么任何理智的实现都会保证 volatile 本身不会重新排序,因此插入了另外两个栅栏:

//任何其他 volatile 的存储//不能用这个不稳定的负载重新排序//[StoreLoad] -- 这个int tmp = 我;//共享变量i"的易失性负载//[加载存储|加载加载]

这里还有一个:

//[StoreStore|LoadStore]我=tmp;//易失性存储//[StoreLoad] -- 还有这个

现在,事实证明,在 x86 上,4 个内存屏障中有 3 个是空闲的 - 因为它是一个 强内存模型.唯一需要实现的是StoreLoad.在其他 CPU 上,例如 ARMlwsycn 是使用的一条指令 - 但我对它们了解不多.

通常 mfencex86StoreLoad 的一个不错的选择,但同样的事情通过lock add (AFAIK 以更便宜的方式),这就是你在那里看到它的原因.基本上 StoreLoad 障碍.是的 - 你的最后一句话是对的,对于较弱的内存模型 - StoreStore 屏障将是必需的.附带说明一下,当您通过构造函数中的 final 字段安全地发布引用时,会使用这种方法.退出构造函数后,插入了两个栅栏:LoadStoreStoreStore.

对这一切持保留态度——只要不违反任何规则,JVM 可以自由地忽略这些:Aleksey Shipilev 对此进行了精彩的讨论.

<小时>

编辑

假设你有这种情况:

[StoreStore|LoadStore]诠释 x = 4;//共享x"变量的易失性存储整数 y = 3;//共享变量y"的非易失性存储诠释 z = x;//可变负载[加载加载|加载存储]

基本上没有障碍会阻止 volatile 存储volatile 加载 一起重新排序(即:将首先执行 volatile 加载),这显然会导致问题;因此违反了顺序一致性.

顺便说一句(如果我没记错的话),通过在 volatile 加载可见之前,不会对 volatile 加载后的每个操作重新排序,您有点错过了这一点. volatile 本身无法重新排序 - 其他操作可以自由地重新排序.举个例子吧:

 int tmp = i;//共享变量i"的易失性负载//[加载存储|加载加载]诠释 x = 3;//普通存储整数 y = 4;//普通存储

最后两个操作 x = 3y = 4 完全可以自由地重新排序,它们不能在 volatile 之上浮动,但它们可以通过自己重新排序.上面的例子是完全合法的:

 int tmp = i;//可变负载//[加载存储|加载加载]//看看它们是如何在这里倒置的...整数 y = 4;//普通存储诠释 x = 3;//普通存储

I am writting this post in connection to Deep understanding of volatile in Java

public class Main {
    private int x;
    private volatile int g;


    public void actor1(){
       x = 1;
       g = 1;
    }


    public void actor2(){
       put_on_screen_without_sync(g);
       put_on_screen_without_sync(x);
    }
}

Now, I am analyzing what JIT generated for above piece of code. From our discussion in my previous post we know that output 1, 0 is impossible because:


write to volatile v causes that every action a preceeding v causes that a will be visible (will be flushed to memory) before v will be visible.


   .................(I removed not important body of method).....

  0x00007f42307d9d5e: c7460c01000000     (1) mov       dword ptr [rsi+0ch],1h
                                                ;*putfield x
                                                ; - package.Main::actor1@2 (line 14)

  0x00007f42307d9d65: bf01000000          (2) mov       edi,1h
  0x00007f42307d9d6a: 897e10              (3) mov       dword ptr [rsi+10h],edi
  0x00007f42307d9d6d: f083042400          (4) lock add  dword ptr [rsp],0h
                                                ;*putfield g
                                                ; - package.Main::actor1@7 (line 15)

  0x00007f42307d9d72: 4883c430            add       rsp,30h
  0x00007f42307d9d76: 5d                  pop       rbp
  0x00007f42307d9d77: 850583535116        test      dword ptr [7f4246cef100h],eax
                                                ;   {poll_return}
  0x00007f42307d9d7d: c3                  ret

Do I understand correctly that it works because x86 cannot make StoreStore reordering? If it could it would require additional memory barrier, yes?


EDITED AFTER EXCELLENT @Eugene's answer:

 int tmp = i; // volatile load
 // [LoadStore]
 // [LoadLoad]

Here, I see what do you mean- it is clear: every action below (after) volatile read (int tmp = i) doesn't be reordered.

 // [StoreLoad] -- this one
 int tmp = i; // volatile load
 // [LoadStore]
 // [LoadLoad]

Here, you put one more barrier. It ensures us that no action will be reordered with int tmp = i. But, why it is important? Why I have doubts? From what I know volatile load guarantees:

Every action after volatile load won't be reordered before volatile load is visible.

I see you write:

There needs to be a sequential consistency

But, I cannot see why sequential consistency is required.

解决方案

A couple of things, first will be flushed to memory - that's pretty erroneous. It's almost never a flush to main memory - it usually drains the StoreBuffer to L1 and it's up to the cache coherency protocol to sync the data between all caches, but if it's easier for you to understand this concept in these terms, it's fine - just know that is slightly different and faster.

It's a good question of why the [StoreLoad] is there indeed, maybe this will clear up things a bit. volatile is indeed all about fences and here is an example of what barriers would be inserted in case of some volatile operations. For example we have a volatile load:

  // i is some shared volatile field
  int tmp = i; // volatile load of "i"
  // [LoadLoad|LoadStore]

Notice the two barriers here LoadStore and LoadLoad; in plain english it means that any Load and Store that come after a volatile load/read can not "move up" the barrier, they can not be re-ordered "above" that volatile load.

And here is the example for volatile store.

 // "i" is a shared volatile variable
 // [StoreStore|LoadStore]
 i = tmp; // volatile store

It means that any Load and Store can not go "below" the load store itself.

This basically builds the happens-before relationship, volatile load being the acquiring load and volatile store being the releasing store (this also has to do with how Store and Load cpu buffers are implemented, but it's pretty much out of the scope of the question).

If you think about it, it makes perfect sense about things that we know about volatile in general; it says that once a volatile store has been observed by a volatile load, everything prior to a volatile store will be observed also and this is on-par with memory barriers. It makes sense now that when a volatile store takes place, everything above it can not go beyond it, and once a volatile load happens, everything below it can not go above it, otherwise this happens-before would be broken.

But that's not it, there's more. There needs to be sequential consistency, that is why any sane implementation will guarantee that volatiles themselves are not re-ordered, thus two more fences are inserted:

 // any store of some other volatile
 // can not be reordered with this volatile load
 // [StoreLoad] -- this one
 int tmp = i; // volatile load of a shared variable "i"
 // [LoadStore|LoadLoad]

And one more here:

// [StoreStore|LoadStore]
i = tmp; // volatile store
// [StoreLoad] -- and this one

Now, it turns out that on x86 3 out of 4 memory barriers are free - since it is a strong memory model. The only one that needs to be implemented is StoreLoad. On other CPU's, like ARM for example, lwsycn is one instruction used - but I don't know much about them.

Usually an mfence is a good option for StoreLoad on x86, but the same thing is guaranteed via lock add (AFAIK in a cheaper way), that is why you see it there. Basically that is the StoreLoad barrier. And yes - you are right in your last sentence, for a weaker memory model - the StoreStore barrier would be required. On a side-note that is what is used when you safely publish a reference via final fields inside a constructor. Upon exiting the constructor there are two fences inserted: LoadStore and StoreStore.

Take all this with a grain of salt - a JVM is free to ignore these as long as it does not break any rules: Aleksey Shipilev has a great talk about this.


EDIT

Suppose you have this case :

[StoreStore|LoadStore]
int x = 4; // volatile store of a shared "x" variable

int y = 3; // non-volatile store of shared variable "y"

int z = x; // volatile load
[LoadLoad|LoadStore]

Basically there is no barrier that would prevent the volatile store to be re-ordered with the volatile load (i.e.: the volatile load would be performed first) and that would cause problems obviously; sequential consistency thus being violated.

You are sort of missing the point here btw (if I am not mistaken) via Every action after volatile load won't be reordered before volatile load is visible. Re-ordering is not possible with the volatile itself - other operations are free to be re-ordered. Let me give you an example:

 int tmp = i; // volatile load of a shared variable "i"
 // [LoadStore|LoadLoad]

 int x = 3; // plain store
 int y = 4; // plain store

The last two operations x = 3 and y = 4 are absolutely free to be re-ordered, they can't float above the volatile, but they can be re-ordered via themselves. The above example would be perfectly legal:

 int tmp = i; // volatile load
 // [LoadStore|LoadLoad]

 // see how they have been inverted here...
 int y = 4; // plain store
 int x = 3; // plain store

这篇关于分析 volatile 上下文中 JIT 生成的 x86 输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆