性能说明:预热后代码运行较慢 [英] Performance Explanation: code runs slower after warm up

查看:70
本文介绍了性能说明:预热后代码运行较慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的代码运行完全相同的计算3次(执行不大:基本上将1到100m的所有数字相加). 前两个块的运行速度大约比第三个块快10倍.我已经运行了10多次此测试程序,结果显示差异很小.

The code below runs the exact same calculation 3 times (it does not do much: basically adding all numbers from 1 to 100m). The first 2 blocks run approximately 10 times faster than the third one. I have run this test program more than 10 times and the results show very little variance.

如果有的话,我希望第三个块运行得更快(JIT编译),但是典型的输出是:

If anything, I would expect the third block to run faster (JIT compilation), but the typical output is:

35974537
36368455
296471550

35974537
36368455
296471550

有人可以解释发生了什么吗? (请注意,我不是要在这里修复任何问题,只是想更好地了解正在发生的事情)

Can somebody explain what is happening? (Just to be clear, I'm not trying to fix anything here, just trying to better understand what is going on)

注意:

  • 程序运行过程中没有运行GC(用-XX:+PrintGC监视)
  • 使用Oracle JDK 1.6.0_30、1.7.0_02和1.7.0_05版进行了测试
  • 还使用以下参数进行了测试:-XX:+PrintGC -Xms1000m -Xmx1000m -XX:NewSize=900m =>相同的结果
  • 如果将块置于循环中,则所有运行都很快
  • 如果将块提取到方法中,则所有运行都很快(无论该方法被调用3次还是循环调用都无济于事)
  • no GC is run during the program (monitored with -XX:+PrintGC)
  • tested with Oracle JDK versions 1.6.0_30, 1.7.0_02 and 1.7.0_05
  • also tested with the following parameters: -XX:+PrintGC -Xms1000m -Xmx1000m -XX:NewSize=900m => same result
  • it the block is put in a loop instead, all runs are fast
  • if the block is extracted to a method, all runs are fast (whether the method is called 3 times or in a loop makes no difference)
public static void main(String... args) {
    //three identical blocks
    {
        long start = System.nanoTime();
        CountByOne c = new CountByOne();
        int sum = 0;
        for (int i = 0; i < 100000000; i++) {
            sum += c.getNext();
        }
        if (sum != c.getSum()) throw new IllegalStateException(); //use sum
        long end = System.nanoTime();
        System.out.println((end - start));
    }
    {
        long start = System.nanoTime();
        CountByOne c = new CountByOne();
        int sum = 0;
        for (int i = 0; i < 100000000; i++) {
            sum += c.getNext();
        }
        if (sum != c.getSum()) throw new IllegalStateException(); //use sum
        long end = System.nanoTime();
        System.out.println((end - start));
    }
    {
        long start = System.nanoTime();
        CountByOne c = new CountByOne();
        int sum = 0;
        for (int i = 0; i < 100000000; i++) {
            sum += c.getNext();
        }
        if (sum != c.getSum()) throw new IllegalStateException(); //use sum
        long end = System.nanoTime();
        System.out.println((end - start));
    }
}

public static class CountByOne {

    private int i = 0;
    private int sum = 0;

    public int getSum() {
        return sum;
    }

    public int getNext() {
        i += 1;
        sum += i;
        return i;
    }
}

推荐答案

简短:Just In Time编译器很笨.

Short: The Just In Time Compiler is dumb.

首先,您可以使用选项-XX:+PrintCompilation来查看JIT做某事的时间.然后,您将看到类似以下内容:

First of all you can use the option -XX:+PrintCompilation to see WHEN the JIT is doing something. Then you will see something like this:

$ java -XX:+PrintCompilation weird
    168    1             weird$CountByOne::getNext (28 bytes)
    174    1 %           weird::main @ 18 (220 bytes)
    279    1 %           weird::main @ -2 (220 bytes)   made not entrant
113727636
    280    2 %           weird::main @ 91 (220 bytes)
106265475
427228826

因此您会看到main方法有时是在第一个和第二个块期间编译的.

So you see that the method main is compiled sometimes during the first and the second block.

添加选项-XX:+PrintCompilation -XX:+UnlockDiagnosticVMOption将为您提供有关JIT正在执行的操作的更多信息.注意,它需要hsdis-amd64.so,在常见的Linux发行版中似乎不太可用.您可能必须要从OpenJDK自己编译它.

Adding the options -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOption will give you more information about what the JIT is doing. Note, it requires hsdis-amd64.so which seems to be not very well available on common Linux distributions. You might have tom compile it on your own from the OpenJDK.

您得到的是getNext和main的大量汇编代码.

What you get is a huge chunk of assembler code for getNext and main.

对我来说,在第一次编译中,似乎实际上只有main中的第一个块才被编译,您可以通过行号来判断.它包含如下有趣的内容:

For me, in the first compilation it seems that only the first block in main is actually compiled, you can tell by the line numbers. It contains funny things like this:

  0x00007fa35505fc5b: add    $0x1,%r8           ;*ladd
                                                ; - weird$CountByOne::getNext@6 (line 12)
                                                ; - weird::main@28 (line 31)
  0x00007fa35505fc5f: mov    %r8,0x10(%rbx)     ;*putfield i
                                                ; - weird$CountByOne::getNext@7 (line 12)
                                                ; - weird::main@28 (line 31)
  0x00007fa35505fc63: add    $0x1,%r14          ;*ladd
                                                ; - weird::main@31 (line 31)

(实际上是很长的,因为展开和内联了循环)

(Indeed it is very long, due to unrolling and inlining of the loop)

很明显,在main的重新编译过程中,第二个AND第三个块已编译.那里的第二个块看起来与第一个版本非常相似. (再次摘录)

Appearently during the recompile of main, the second AND third block is compiled. The second block there looks very similar to the first version. (Again just an excerpt)

 0x00007fa35505f05d: add    $0x1,%r8           ;*ladd
                                                ; - weird$CountByOne::getNext@6 (line 12)
                                                ; - weird::main@101 (line 42)
  0x00007fa35505f061: mov    %r8,0x10(%rbx)     ;*putfield i
                                                ; - weird$CountByOne::getNext@7 (line 12)
                                                ; - weird::main@101 (line 42)
  0x00007fa35505f065: add    $0x1,%r13          ;*ladd

但是,第三块的编译方式不同.无需内联和展开

HOWEVER the third block is compiled differently. Without inlining and unrolling

这一次,整个循环如下所示:

This time the entire loop looks like this:

  0x00007fa35505f20c: xor    %r10d,%r10d
  0x00007fa35505f20f: xor    %r8d,%r8d          ;*lload
                                                ; - weird::main@171 (line 53)
  0x00007fa35505f212: mov    %r8d,0x10(%rsp)
  0x00007fa35505f217: mov    %r10,0x8(%rsp)
  0x00007fa35505f21c: mov    %rbp,%rsi
  0x00007fa35505f21f: callq  0x00007fa355037c60  ; OopMap{rbp=Oop off=580}
                                                ;*invokevirtual getNext
                                                ; - weird::main@174 (line 53)
                                                ;   {optimized virtual_call}
  0x00007fa35505f224: mov    0x8(%rsp),%r10
  0x00007fa35505f229: add    %rax,%r10          ;*ladd
                                                ; - weird::main@177 (line 53)
  0x00007fa35505f22c: mov    0x10(%rsp),%r8d
  0x00007fa35505f231: inc    %r8d               ;*iinc
                                                ; - weird::main@180 (line 52)
  0x00007fa35505f234: cmp    $0x5f5e100,%r8d
  0x00007fa35505f23b: jl     0x00007fa35505f212  ;*if_icmpge
                                                ; - weird::main@168 (line 52)

我的猜测是,JIT识别出这部分代码没有被大量使用,因为它使用了第二个块执行中的性能分析信息,因此并未对其进行大量优化.而且,在所有相关部分都已编译之后,不重新编译一种方法的意义上,JIT似乎是懒惰的.请记住,第一个编译结果根本不包含第二个/第三个块AT的源代码,因此JIT必须重新编译它.

My guess is that the JIT identified that this part of the code is not used a lot, since it was using profiling information from the second block execution, and therefore did not optimize it heavily. Also the JIT appears to be lazy in a sense not to recompile one method after all relevant parts have been compiled. Remember the first compilation result did not contain source code for the second/third block AT all, so the JIT had to recompile that.

这篇关于性能说明:预热后代码运行较慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆