Java JVM 中的指令重新排序 [英] Instructions reordering in Java JVM

查看:21
本文介绍了Java JVM 中的指令重新排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读这篇博文.

而且作者说的是在多线程环境下打破String中的hashCode().

And the author was talking about breaking the hashCode() in String in multithread environment.

通过拥有:

public int hashCode() {
     int h = hash;
     if (h == 0) {
         int off = offset;
         char val[] = value;
         int len = count;

         for (int i = 0; i < len; i++) {
             h = 31*h + val[off++];
         }
         hash = h;
     }
     return h;
 }

改为:

public int hashCode() {
     if (hash == 0) {
         int off = offset;
         char val[] = value;
         int len = count;

         int h = 0;
         for (int i = 0; i < len; i++) {
             h = 31*h + val[off++];
         }
         hash = h;
     }
     return hash;
 }

作者说的,我引用的:

我在这里做的是添加一个额外的读取:第二次读取哈希,在返回之前.听起来很奇怪,也不太可能发生,第一次读取可以返回正确计算的哈希值,第二次读取可以返回 0!这在内存模型下是允许的,因为该模型允许对操作进行大量重新排序.第二次读取实际上可以在您的代码中移动,以便您的处理器在第一个之前做!"

"What I've done here is to add an additional read: the second read of hash, before the return. As odd as it sounds, and as unlikely as it is to happen, the first read can return the correctly computed hash value, and the second read can return 0! This is allowed under the memory model because the model allows extensive reordering of operations. The second read can actually be moved, in your code, so that your processor does it before the first!"

进一步查看评论,有人说可以重新排序

So further going through comments, someone says it can be reordered to

int h = hash;
if (hash == 0) {
  ...
}
return h;

这怎么可能?我认为重新排序只涉及上下移动程序语句.它遵循什么规则?我在谷歌上搜索过,阅读过 JSR133 常见问题解答,查看了 Java 并发实践书,但我似乎找不到一个可以帮助我理解特别是重新排序的地方.如果有人能指出我正确的方向,我将不胜感激.

How is that possible? I thought reordering only involves moving program statements up and down. What rules is it following? I've Googled, read the JSR133 FAQ, checked the Java Concurrency in Practice book, but I can't seem to find a place that helps me to understand particularly on reordering. If anyone can point me to the right direction, I would really appreciate it.

感谢Louis澄清了重新排序"的含义,我没有考虑byteCode"

但是,我仍然不明白为什么允许将 2nd Read 移到前面,这是我将其转换为某种字节码"格式的幼稚尝试.

However, I still don't understand why is it allowed to move the 2nd Read to the front, this is my naive attempt to translate it to somewhat "bytecode" format.

为了简单起见,用于计算哈希码的操作表示为calchash().因此,我将程序表示为:

For simplification purpose, operations that are used to calculate the hashcode are express as calchash(). Therefore, I express the program as:

if (hash == 0)  {       
    h = calchash();
    hash = h;
}
return hash;

我尝试以字节码"形式表达它:

And my attempt to express it in "bytecode" form:

R1,R2,R3 are in the operands stack, or the registers
h is in the array of local variables

按程序顺序:

if (hash == 0)  {       ---------- R1 = read hash from memory (1st read)
                        ---------- Compare (R1 == 0)
    h = calchash();     ---------- R2 = calchash()
                        ---------- h = R2 (Storing the R2 to local variable h)
    hash = h;           ---------- Hash = h (write to hash)
}
return hash             ---------- R3 = read hash from memory again(2nd read)
                        ---------- return R3

重新排序的转换(我的版本基于评论):

Reordered transformation (My version based on comments):

                        ---------- R3 = read hash from memory (2nd read) *moved*
if (hash == 0)  {       ---------- R1 = read hash from memory (1st read)
                        ---------- Compare (R1 == 0)
    h = calchash();     ---------- R2 = calchash()
                        ---------- h = R2 (Storing the R2 to local variable h)
    hash = h;           ---------- hash = h (write to hash)
}
return hash             ---------- return R3

再次查看评论,发现作者已回答:

重新排序的转换(来自博客)

Reordered Transformation (From the blog)

r1 = hash;
if (hash == 0) {
  r1 = hash = // calculate hash
}
return r1;

这种情况实际上适用于单线程,但多线程可能会失败.

This case actually works on single thread, but it's possible to fail with multiple threads.

看来 JVM 正在做简化基于

It seems that the JVM are making simplifications based on

h = hash and it simplifies the use of R1, R2, R3 to single R1

因此,JVM 不仅仅是重新排序指令,它似乎还减少了使用的寄存器数量.

Therefore, JVM does more than reordering instructions, it also seems reducing the amount of registers being used.

推荐答案

在你修改的代码中:

public int hashCode() {
     if (hash == 0) { // (1)
         int off = offset;
         char val[] = value;
         int len = count;

         int h = 0;
         for (int i = 0; i < len; i++) {
             h = 31*h + val[off++];
         }
         hash = h;
     }
     return hash; // (2)
 }

(1) 和 (2) 可以重新排序:(1) 可以读取非空值,而 (2) 可以读取 0.这在 String 类的实际实现中不会发生,因为计算是在局部变量和返回值也是那个局部变量,根据定义,它是线程安全的.

(1) and (2) could be reordered: (1) could read a non null value while (2) would read 0. That can't happen in the actual implementation in the String class because the calculation is made on the local variable and the return value is also that local variable, which, by definition, is thread safe.

问题在于,Java 内存模型无法保证在没有适当同步的情况下访问共享变量 (hash) - 特别是它不能保证所有执行顺序一致.如果 hash 是 volatile,修改后的代码就不会有问题.

The issue is that the Java Memory Model provides no guarantee when a shared variable (hash) is accessed without proper synchronization - in particular it does not guarantee that all executions will be sequentially consistent. Had hash been volatile, there would be no problem with the modified code.

ps:我相信该博客的作者是 JLS 第 17 章(Java 内存模型)的作者之一 - 所以无论如何我都会倾向于相信他 ;-)

ps: the author of that blog, I believe, is one of the writers of the Chapter 17 (Java Memory Model) of the JLS - so I would tend to believe him anyway ;-)

更新

遵循各种编辑/评论 - 让我们使用这两种方法更详细地查看字节码(我假设哈希码始终为 1 以保持简单):

Following the various edits / comments - let's look at the bytecode in more details with these two methods (I assume that the hashcode is always 1 to keep things simple):

public int hashcode_shared() {
    if (hash == 0) { hash = 1; }
    return hash;
}

public int hashcode_local() {
    int h = hash;
    if (h == 0) { hash = h = 1; }
    return h;
}

我机器上的 java 编译器生成以下字节码:

The java compiler on my machine generates the following bytecode:

public int hashcode_shared();
   0: aload_0                           //read this
   1: getfield      #6                  //read hash (r1)
   4: ifne          12                  //compare r1 with 0
   7: aload_0                           //read this
   8: iconst_1                          //constant 1
   9: putfield      #6                  //put 1 into hash (w1)
  12: aload_0                           //read this
  13: getfield      #6                  //read hash (r2)
  16: ireturn                           //return r2

public int hashcode_local();
   0: aload_0                           //read this
   1: getfield      #6                  //read hash (r1)
   4: istore_1                          //store r1 in local variable h
   5: iload_1                           //read h
   6: ifne          16                  //compare h with 0
   9: aload_0                           //read this
  10: iconst_1                          //constant 1
  11: dup                               //constant again
  12: istore_1                          //store 1 into h
  13: putfield      #6                  //store 1 into hash (w1)
  16: iload_1                           //read h
  17: ireturn                           //return h

在第一个示例中,共享变量 hash 有 2 次读取:r1 和 r2.如上所述,因为没有同步并且变量是共享的,所以 Java 内存模型适用,并且允许编译器/JVM 对两次读取重新排序:第 13 行可以插入到第 1 行* 之前.

In the first example, there are 2 reads of the shared variable hash: r1 and r2. As discussed above, because there is no synchronization and the variable is shared, the Java Memory Model applies and a compiler/JVM is allowed to reorder the two reads: line #13 could be inserted before line #1*.

在第二个例子中,由于线程内语义和非共享变量的程序顺序保证,所有对局部变量h的操作都需要顺序一致.

In the second example, all the operations on h, the local variable, need to be sequentially consistent because because of intra-thread semantics and program order guarantee on non-shared variables.

注意:与往常一样,允许重新排序的事实并不意味着它会被执行.这实际上不太可能发生在当前的 x86/热点组合上.但它可能发生在其他当前或未来的架构/JVM 上.

Note: as always, the fact that the reordering is allowed does not mean it will be performed. It is actually unlikely to happen on current x86/hotspot combinations. But it could happen on other current or future architectures/JVM.

*这是一个捷径,在实践中可能发生的情况是编译器可能会像这样重写hashcode_shared:

*That is a bit of a shortcut, what could happen in practice is that the compiler might rewrite hashcode_shared like this:

public int hashcode_shared() {
    int h = hash;
    if (hash != 0) return h;
    return (hash = 1);
}

该代码在单线程环境中严格等效(它将始终返回与原始方法相同的值),因此允许重新排序.但是在多线程环境下,很明显,如果hash在前两行之间被另一个线程从0改为1,这个重新排序的方法会错误地返回0.

The code is strictly equivalent in a single threaded environment (it will always return the same value as the original method) so the reordering is allowed. But in a multi-threaded environment, it is clear that if hash is changed from 0 to 1 by another thread between the first two lines, this reordered method will incorrectly return 0.

这篇关于Java JVM 中的指令重新排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆