为什么“while(i ++< n){}"明显慢于“while(++ i< n){}” [英] Why is "while (i++ < n) {}" significantly slower than "while (++i < n) {}"

查看:124
本文介绍了为什么“while(i ++< n){}"明显慢于“while(++ i< n){}”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

显然在我的Windows 8笔记本电脑上使用HotSpot JDK 1.7.0_45(所有编译器/ VM选项都设置为默认值),以下循环

Apparently on my Windows 8 laptop with HotSpot JDK 1.7.0_45 (with all compiler/VM options set to default), the below loop

final int n = Integer.MAX_VALUE;
int i = 0;
while (++i < n) {
}

at at至少2个数量级(~10 ms vs.~5000 ms)比:

is at least 2 orders of magnitude faster (~10 ms vs. ~5000 ms) than:

final int n = Integer.MAX_VALUE;
int i = 0;
while (i++ < n) {
}

我碰巧注意到了编写循环以评估另一个不相关的性能问题时出现此问题。而 ++ i< n i ++< n 足以显着影响结果。

I happened to notice this problem while writing a loop to evaluate another irrelevant performance issue. And the difference between ++i < n and i++ < n was huge enough to significantly influence the result.

如果我们查看字节码,更快版本的循环体是:

If we look at the bytecode, the loop body of the faster version is:

iinc
iload
ldc
if_icmplt

对于较慢的版本:

iload
iinc
ldc
if_icmplt

因此对于 ++ i< n ,它首先将局部变量 i 递增1,然后将其推入操作数堆栈,同时 i ++< n 以相反的顺序执行这两个步骤。但这似乎并不能解释为什么前者更快。后一种情况是否涉及临时副本?或者是字节码(VM实现,硬件等)之外应该对性能差异负责吗?

So for ++i < n, it first increments local variable i by 1 and then push it onto the operand stack while i++ < n does those 2 steps in reverse order. But that doesn't seem to explain why the former is much faster. Is there any temp copy involved in the latter case? Or is it something beyond the bytecode (VM implementation, hardware, etc.) that should be responsible for the performance difference?

我已经阅读了一些关于<$的其他讨论c $ c> ++ i 和 i ++ (尽管不是详尽无遗),但没有找到任何与Java特定且直接相关的答案对于 ++ i i ++ 参与价值比较的情况。

I've read some other discussion regarding ++i and i++ (not exhaustively though), but didn't find any answer that is Java-specific and directly related to the case where ++i or i++ is involved in a value comparison.

推荐答案

正如其他人所指出的那样,这项测试在很多方面存在缺陷。

As others have pointed out, the test is flawed in many ways.

您没有完全告诉我们 您是如何进行此项测试的。但是,我试图实现这样的天真测试(没有冒犯):

You did not tell us exactly how you did this test. However, I tried to implement a "naive" test (no offense) like this:

class PrePostIncrement
{
    public static void main(String args[])
    {
        for (int j=0; j<3; j++)
        {
            for (int i=0; i<5; i++)
            {
                long before = System.nanoTime();
                runPreIncrement();
                long after = System.nanoTime();
                System.out.println("pre  : "+(after-before)/1e6);
            }
            for (int i=0; i<5; i++)
            {
                long before = System.nanoTime();
                runPostIncrement();
                long after = System.nanoTime();
                System.out.println("post : "+(after-before)/1e6);
            }
        }
    }

    private static void runPreIncrement()
    {
        final int n = Integer.MAX_VALUE;
        int i = 0;
        while (++i < n) {}
    }

    private static void runPostIncrement()
    {
        final int n = Integer.MAX_VALUE;
        int i = 0;
        while (i++ < n) {}
    }
}

使用默认设置运行时,似乎有一点不同。但是,当您使用 -server 标志运行时,基准测试的真实缺陷会变得明显。在我的情况下的结果是类似的

When running this with default settings, there seems to be a small difference. But the real flaw of the benchmark becomes obvious when you run this with the -server flag. The results in my case then are along something like

...
pre  : 6.96E-4
pre  : 6.96E-4
pre  : 0.001044
pre  : 3.48E-4
pre  : 3.48E-4
post : 1279.734543
post : 1295.989086
post : 1284.654267
post : 1282.349093
post : 1275.204583

显然,预增量版本已完全优化。原因很简单:结果没有使用。无论循环是否执行都没关系,因此JIT只是将其删除。

Obviously, the pre-increment version has been completely optimized away. The reason is rather simple: The result is not used. It does not matter at all whether the loop is executed or not, so the JIT simply removes it.

通过查看热点反汇编来确认:预增量版本产生以下代码:

This is confirmed by a look at the hotspot disassembly: The pre-increment version results in this code:

[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x0000000055060500} &apos;runPreIncrement&apos; &apos;()V&apos; in &apos;PrePostIncrement&apos;
  #           [sp+0x20]  (sp of caller)
  0x000000000286fd80: sub    $0x18,%rsp
  0x000000000286fd87: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - PrePostIncrement::runPreIncrement@-1 (line 28)

  0x000000000286fd8c: add    $0x10,%rsp
  0x000000000286fd90: pop    %rbp
  0x000000000286fd91: test   %eax,-0x243fd97(%rip)        # 0x0000000000430000
                                                ;   {poll_return}
  0x000000000286fd97: retq   
  0x000000000286fd98: hlt    
  0x000000000286fd99: hlt    
  0x000000000286fd9a: hlt    
  0x000000000286fd9b: hlt    
  0x000000000286fd9c: hlt    
  0x000000000286fd9d: hlt    
  0x000000000286fd9e: hlt    
  0x000000000286fd9f: hlt    

增量后版本结果代码:

[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x00000000550605b8} &apos;runPostIncrement&apos; &apos;()V&apos; in &apos;PrePostIncrement&apos;
  #           [sp+0x20]  (sp of caller)
  0x000000000286d0c0: sub    $0x18,%rsp
  0x000000000286d0c7: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - PrePostIncrement::runPostIncrement@-1 (line 35)

  0x000000000286d0cc: mov    $0x1,%r11d
  0x000000000286d0d2: jmp    0x000000000286d0e3
  0x000000000286d0d4: nopl   0x0(%rax,%rax,1)
  0x000000000286d0dc: data32 data32 xchg %ax,%ax
  0x000000000286d0e0: inc    %r11d              ; OopMap{off=35}
                                                ;*goto
                                                ; - PrePostIncrement::runPostIncrement@11 (line 36)

  0x000000000286d0e3: test   %eax,-0x243d0e9(%rip)        # 0x0000000000430000
                                                ;*goto
                                                ; - PrePostIncrement::runPostIncrement@11 (line 36)
                                                ;   {poll}
  0x000000000286d0e9: cmp    $0x7fffffff,%r11d
  0x000000000286d0f0: jl     0x000000000286d0e0  ;*if_icmpge
                                                ; - PrePostIncrement::runPostIncrement@8 (line 36)

  0x000000000286d0f2: add    $0x10,%rsp
  0x000000000286d0f6: pop    %rbp
  0x000000000286d0f7: test   %eax,-0x243d0fd(%rip)        # 0x0000000000430000
                                                ;   {poll_return}
  0x000000000286d0fd: retq   
  0x000000000286d0fe: hlt    
  0x000000000286d0ff: hlt    

对我来说,为什么它似乎删除后增量版本并不完全清楚。 (事实上​​,我认为这是一个单独的问题)。但至少,这解释了为什么你可能会看到数量级的差异......

It's not entirely clear for me why it seemingly does not remove the post-increment version. (In fact, I consider asking this as a separate question). But at least, this explains why you might see differences with an "order of magnitude"...

编辑:有趣的是,将循环的上限从 Integer.MAX_VALUE 更改为 Integer.MAX_VALUE-1 ,然后两个版本都经过优化,需要零时间。不知何故,此限制(在程序集中仍显示为 0x7fffffff )会阻止优化。据推测,这与比较被映射到(有罪的!) cmp 指令有关,但除此之外我无法给出深刻的理由。 JIT以神秘的方式运作......

Interestingly, when changing the upper limit of the loop from Integer.MAX_VALUE to Integer.MAX_VALUE-1, then both versions are optimized away and require "zero" time. Somehow this limit (which still appears as 0x7fffffff in the assembly) prevents the optimization. Presumably, this has something to do with the comparison being mapped to a (singed!) cmp instruction, but I can not give a profound reason beyond that. The JIT works in mysterious ways...

这篇关于为什么“while(i ++&lt; n){}&quot;明显慢于“while(++ i&lt; n){}”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆