强制使用tableswitch而不是lookupswitch [英] Force tableswitch instead of lookupswitch

查看:63
本文介绍了强制使用tableswitch而不是lookupswitch的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Scala 2.11将相对密集的 Int 范围内的 match 表达式编译为 lookupswitch :

  lookupswitch {//21-12:200-11:200-10:184-9:190-8:190-7:190-6:190-5:190-4:200-1:2002:1953:1954:1955:1956:1847:18412:18413:18418:18421:18425:184默认值:180} 

Java 7将等效的 switch 语句编译为 tableswitch :

  tableswitch {//-12至25-12:168-11:168-10:177-9:174-8:174-7:174-6:174-5:174-4:168-3:185-2:185-1:1680:1851:1852:1713:1714:1715:1716:1777:1778:1859:18510:18511:18512:18113:18114:18515:18516:18517:18518:18119:18520:18521:18122:18523:18524:18525:181默认值:185} 

是否有某种方法也可以强制Scala生成 tableswitch ?

解决方案

您不必在意字节码,因为现代JVM足够聪明,可以同时编译 lookupswitch tableswitch以类似的有效方式.

直观地 tableswitch 应该更快,这也由以下建议:

组装

让我们通过查看生成的汇编代码来验证理论.
以下JVM选项将有所帮助: -XX:CompileCommand = print,bench.Switch :: *

 #{方法} {0x0000000017498a48}'bench/Switch'中的'lookupSwitch''(I)I'#parm0:rdx =整数#[sp + 0x20](呼叫方的sp)0x000000000329b240:低于$ 0x18,%rsp0x000000000329b247:mov%rbp,0x10(%rsp); *同步条目;-Bench.Switch :: lookupSwitch @ -10x000000000329b24c:cmp $ 0x4,%edx0x000000000329b24f:je 0x000000000329b2a50x000000000329b251:cmp $ 0x4,%edx0x000000000329b254:jg 0x000000000329b2810x000000000329b256:cmp $ 0x2,%edx0x000000000329b259:je 0x000000000329b27a0x000000000329b25b:cmp $ 0x2,%edx0x000000000329b25e:jg 0x000000000329b2730x000000000329b260:cmp $ 0x1,%edx0x000000000329b263:jne 0x000000000329b26c; * lookupswitch;-Bench.Switch :: lookupSwitch @ 1... 

我们在这里看到的是从中间值4开始的二进制搜索(这解释了为什么案例4在上图中具有最佳性能).

但是有趣的是, tableSwitch 的编译方式完全相同!

 #{方法} {0x0000000017528b18}'bench/Switch'中的'tableSwitch''(I)I'#parm0:rdx =整数#[sp + 0x20](呼叫方的sp)0x000000000332c280:sub $ 0x18,%rsp0x000000000332c287:mov%rbp,0x10(%rsp); *同步条目;-bench.Switch :: tableSwitch @ -10x000000000332c28c:cmp $ 0x4,%edx0x000000000332c28f:je 0x000000000332c2e50x000000000332c291:cmp $ 0x4,%edx0x000000000332c294:jg 0x000000000332c2c10x000000000332c296:cmp $ 0x2,%edx0x000000000332c299:je 0x000000000332c2ba0x000000000332c29b:cmp $ 0x2,%edx0x000000000332c29e:jg 0x000000000332c2b30x000000000332c2a0:cmp $ 0x1,%edx0x000000000332c2a3:jne 0x000000000332c2ac; *表开关;-bench.Switch :: tableSwitch @ 1... 

跳转表

但是等等……为什么要二进制搜索,而不是跳转表?

HotSpot JVM具有启发式功能,只能为10个以上案例的交换机生成跳转表.可以通过选项 -XX:MinJumpTableSize = 进行更改.

好的,让我们用更多标签扩展测试用例,然后再次检查生成的代码.

 #{方法} {0x0000000017288a68}'bench/Switch'中的'lookupSwitch''(I)I'#parm0:rdx =整数#[sp + 0x20](呼叫方的sp)0x000000000307ecc0:子$ 0x18,%rsp;{no_reloc}0x000000000307ecc7:mov%rbp,0x10(%rsp); *同步条目;-Bench.Switch :: lookupSwitch @ -10x000000000307eccc:mov%edx,%r10d0x000000000307eccf:dec%r10d0x000000000307ecd2:cmp $ 0xa,%r10d0x000000000307ecd6:jb 0x000000000307ece90x000000000307ecd8:移动$ 0xffffffff,%eax0x000000000307ecdd:加$ 0x10,%rsp0x000000000307ece1:弹出%rbp0x000000000307ece2:测试%eax,-0x1faece8(%rip)#0x00000000010d0000;{poll_return}0x000000000307ece8:retq0x000000000307ece9:movslq%edx,%r100x000000000307ecec:movabs $ 0x307ec60,%r11;{section_word}0x000000000307ecf6:jmpq * -0x8(%r11,%r10,8); * lookupswitch;-Bench.Switch :: lookupSwitch @ 1^^^^^^^^^^^^^^^^^^^^^^^^^^ 

是的!这是我们计算出的跳转指令.请注意,这是为 lookupswitch 生成的.但是 tableswitch 将完​​全相同.

令人惊讶的是,HotSpot JVM甚至可以为带有间隙和异常值的交换机生成跳转表. -XX:MaxJumpTableSparseness 控制差距有多大.例如.如果标签的范围是1到10,则标签的范围是13到20,最后一个标签的值为99-JIT将为值99生成保护测试,其余标签将创建一个表.

源代码

HotSpot源代码将最终说服 解决方案

You should not care about the bytecode, since modern JVMs are smart enough to compile both lookupswitch and tableswitch in a similarly efficient way.

Intuitively tableswitch should be faster, and this is also suggested by JVM specification:

Thus, a tableswitch instruction is probably more efficient than a lookupswitch where space considerations permit a choice.

However, the spec was written 20+ years ago, when JVM had no JIT compiler. Is there a performance difference in a modern HotSpot JVM?

The benchmark

package bench;

import org.openjdk.jmh.annotations.*;

@State(Scope.Benchmark)
public class SwitchBench {
    @Param({"1", "2", "3", "4", "5", "6", "7", "8"})
    int n;

    @Benchmark
    public long lookupSwitch() {
        return Switch.lookupSwitch(n);
    }

    @Benchmark
    public long tableSwitch() {
        return Switch.tableSwitch(n);
    }
}

To have precise control over the bytecode, I build Switch class with Jasmin.

.class public bench/Switch
.super java/lang/Object

.method public static lookupSwitch(I)I
    .limit stack 1

    iload_0
    lookupswitch
      1 : One
      2 : Two
      3 : Three
      4 : Four
      5 : Five
      6 : Six
      7 : Seven
      default : Other

One:
    bipush 11
    ireturn
Two:
    bipush 22
    ireturn
Three:
    bipush 33
    ireturn
Four:
    bipush 44
    ireturn
Five:
    bipush 55
    ireturn
Six:
    bipush 66
    ireturn
Seven:
    bipush 77
    ireturn
Other:
    bipush -1
    ireturn
.end method

.method public static tableSwitch(I)I
    .limit stack 1

    iload_0
    tableswitch 1
      One
      Two
      Three
      Four
      Five
      Six
      Seven
      default : Other

One: 
    bipush 11
    ireturn
Two:
    bipush 22
    ireturn
Three:
    bipush 33
    ireturn
Four:
    bipush 44
    ireturn
Five:
    bipush 55
    ireturn
Six:
    bipush 66
    ireturn
Seven:
    bipush 77
    ireturn
Other:
    bipush -1
    ireturn
.end method

The results show no performance difference between lookupswitch/tableswitch benchmarks, but there is a small variation depending on switch argument:

Assembly

Let's verify the theory by looking at generated assembly code.
The following JVM option will help: -XX:CompileCommand=print,bench.Switch::*

  # {method} {0x0000000017498a48} 'lookupSwitch' '(I)I' in 'bench/Switch'
  # parm0:    rdx       = int
  #           [sp+0x20]  (sp of caller)
  0x000000000329b240: sub    $0x18,%rsp
  0x000000000329b247: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - bench.Switch::lookupSwitch@-1

  0x000000000329b24c: cmp    $0x4,%edx
  0x000000000329b24f: je     0x000000000329b2a5
  0x000000000329b251: cmp    $0x4,%edx
  0x000000000329b254: jg     0x000000000329b281
  0x000000000329b256: cmp    $0x2,%edx
  0x000000000329b259: je     0x000000000329b27a
  0x000000000329b25b: cmp    $0x2,%edx
  0x000000000329b25e: jg     0x000000000329b273
  0x000000000329b260: cmp    $0x1,%edx
  0x000000000329b263: jne    0x000000000329b26c  ;*lookupswitch
                                                 ; - bench.Switch::lookupSwitch@1
  ...

What we see here is a binary search starting with a mid value 4 (this explains why case 4 has the best performance in the graph above).

But the interesting thing is that tableSwitch is compiled exactly the same way!

  # {method} {0x0000000017528b18} 'tableSwitch' '(I)I' in 'bench/Switch'
  # parm0:    rdx       = int
  #           [sp+0x20]  (sp of caller)
  0x000000000332c280: sub    $0x18,%rsp
  0x000000000332c287: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - bench.Switch::tableSwitch@-1

  0x000000000332c28c: cmp    $0x4,%edx
  0x000000000332c28f: je     0x000000000332c2e5
  0x000000000332c291: cmp    $0x4,%edx
  0x000000000332c294: jg     0x000000000332c2c1
  0x000000000332c296: cmp    $0x2,%edx
  0x000000000332c299: je     0x000000000332c2ba
  0x000000000332c29b: cmp    $0x2,%edx
  0x000000000332c29e: jg     0x000000000332c2b3
  0x000000000332c2a0: cmp    $0x1,%edx
  0x000000000332c2a3: jne    0x000000000332c2ac  ;*tableswitch
                                                 ; - bench.Switch::tableSwitch@1
  ...

Jump table

But wait... Why binary search, not a jump table?

HotSpot JVM has an heuristics to generate a jump table only for switches with 10+ cases. This can be altered by the option -XX:MinJumpTableSize=.

OK, let's extend our test case with some more labels and check the generated code once again.

  # {method} {0x0000000017288a68} 'lookupSwitch' '(I)I' in 'bench/Switch'
  # parm0:    rdx       = int
  #           [sp+0x20]  (sp of caller)
  0x000000000307ecc0: sub    $0x18,%rsp         ;   {no_reloc}
  0x000000000307ecc7: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - bench.Switch::lookupSwitch@-1

  0x000000000307eccc: mov    %edx,%r10d
  0x000000000307eccf: dec    %r10d
  0x000000000307ecd2: cmp    $0xa,%r10d
  0x000000000307ecd6: jb     0x000000000307ece9
  0x000000000307ecd8: mov    $0xffffffff,%eax
  0x000000000307ecdd: add    $0x10,%rsp
  0x000000000307ece1: pop    %rbp
  0x000000000307ece2: test   %eax,-0x1faece8(%rip)        # 0x00000000010d0000
                                                ;   {poll_return}
  0x000000000307ece8: retq   
  0x000000000307ece9: movslq %edx,%r10
  0x000000000307ecec: movabs $0x307ec60,%r11    ;   {section_word}
  0x000000000307ecf6: jmpq   *-0x8(%r11,%r10,8)  ;*lookupswitch
                                                ; - bench.Switch::lookupSwitch@1
                      ^^^^^^^^^^^^^^^^^^^^^^^^^

Yes! Here is our computed jump instruction. Note, this is generated for lookupswitch. But there will be exactly the same one for tableswitch.

Amazingly, HotSpot JVM can generate jump tables even for switches with gaps and with outliers. -XX:MaxJumpTableSparseness controls how large the gaps can be. E.g. if there are labels from 1 to 10, then from 13 to 20 and the last label with the value 99 - JIT will generate a guard test for the value 99, and for the rest labels it will create a table.

Source code

HotSpot source code will finally convince there should be no performance difference between lookupswitch and tableswitch after a method is JIT-compiled with C2. That's basically because the parsing of both instructions ends up with a call to the same jump_switch_ranges function that works for an arbitrary set of labels.

Conclusion

As we saw, HotSpot JVM may compile tableswitch using a binary search and lookupswitch using a jump table, or vice versa. This depends on the number and the density of labels, but not on the bytecode itself.

So, answering your original question - you don't need to!

这篇关于强制使用tableswitch而不是lookupswitch的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆