具有相同运行时类但不同静态类型的对象的不同性能 [英] Different performance of object with same runtime class but different static type

查看:64
本文介绍了具有相同运行时类但不同静态类型的对象的不同性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下 jmh 基准

Consider the following jmh benchmark

@State(Scope.Benchmark)
@BenchmarkMode(Array(Mode.Throughput))
class So59893913 {
  def seq(xs: Seq[Int]) = xs.sum
  def range(xs: Range) = xs.sum

  val xs = 1 until 100000000
  @Benchmark def _seq = seq(xs)
  @Benchmark def _range = range(xs)
}

给定 xs 引用作为参数传入 seqrange 的运行时类 Range.Inclusive 的相同对象> 方法,因此动态调度应该调用 sum 的相同实现,尽管方法参数声明的静态类型不同,为什么性能似乎差异如此巨大,如下所示?

Given xs references the same object of runtime class Range.Inclusive passed in as argument to seq and range methods, hence dynamic dispatch should invoke the same implementation of sum, despite differing declared static type of method parameter, why the performance seems to differ so drastically as indicated below?

sbt "jmh:run -i 10 -wi 5 -f 2 -t 1 -prof gc bench.So59893913"

[info] Benchmark                                          Mode  Cnt          Score          Error   Units
[info] So59893913._range                                 thrpt   20  334923591.408 ± 22126865.963   ops/s
[info] So59893913._range:·gc.alloc.rate                  thrpt   20         ≈ 10⁻⁴                 MB/sec
[info] So59893913._range:·gc.alloc.rate.norm             thrpt   20         ≈ 10⁻⁷                   B/op
[info] So59893913._range:·gc.count                       thrpt   20            ≈ 0                 counts
[info] So59893913._seq                                   thrpt   20  193509091.399 ±  2347303.746   ops/s
[info] So59893913._seq:·gc.alloc.rate                    thrpt   20       2811.311 ±       34.142  MB/sec
[info] So59893913._seq:·gc.alloc.rate.norm               thrpt   20         16.000 ±        0.001    B/op
[info] So59893913._seq:·gc.churn.PS_Eden_Space           thrpt   20       2811.954 ±       33.656  MB/sec
[info] So59893913._seq:·gc.churn.PS_Eden_Space.norm      thrpt   20         16.004 ±        0.035    B/op
[info] So59893913._seq:·gc.churn.PS_Survivor_Space       thrpt   20          0.013 ±        0.005  MB/sec
[info] So59893913._seq:·gc.churn.PS_Survivor_Space.norm  thrpt   20         ≈ 10⁻⁴                   B/op
[info] So59893913._seq:·gc.count                         thrpt   20       3729.000                 counts
[info] So59893913._seq:·gc.time                          thrpt   20       1864.000                     ms

特别注意 gc.alloc.rate 指标的差异.

Particularly notice the difference in gc.alloc.rate metrics.

推荐答案

有两件事正在发生.

首先,当 xs 具有静态类型 Range 时,对 sum 的调用是一个单态方法调用(因为 sumRange 中是最终的),JVM 可以轻松内联该方法并进一步优化它.当 xs 具有静态类型 Seq 时,它就变成了一个不会被内联和完全优化的巨态方法调用.

The first is that when xs has the static type Range then that call to sum is a monomorphic method call (because sum is final in Range) and the JVM can easily inline that method and optimize it further. When xs has the static type Seq then it becomes a megamorphic method call which won't get inlined and fully optimized.

第二个是被调用的方法实际上并不相同.编译器在Range中生成两个sum方法:

The second is that the methods that get called are not actually the same. The compiler generates two sum methods in Range:

scala> :javap -p scala.collection.immutable.Range
Compiled from "Range.scala"
public abstract class scala.collection.immutable.Range extends scala.collection.immutable.AbstractSeq<java.lang.Object> implements scala.collection.immutable.IndexedSeq<java.lang.Object>, scala.collection.immutable.StrictOptimizedSeqOps<java.lang.Object, scala.collection.immutable.IndexedSeq, scala.collection.immutable.IndexedSeq<java.lang.Object>>, java.io.Serializable {
...
public final <B> int sum(scala.math.Numeric<B>);
...
public final java.lang.Object sum(scala.math.Numeric);
...
}

第一个包含您在源代码中看到的实际实现.正如你所看到的,它返回一个未装箱的 int.第二个是这样的:

The first one contains the actual implementation that you see in the source code. And as you can see it returns an unboxed int. The second one is this:

  public final java.lang.Object sum(scala.math.Numeric);
    Code:
       0: aload_0
       1: aload_1
       2: invokevirtual #898                // Method sum:(Lscala/math/Numeric;)I
       5: invokestatic  #893                // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
       8: areturn

正如你所看到的,这个只是调用另一个 sum 方法并将 int 装箱成一个 java.lang.Integer.

As you see this one just calls the other sum method and boxes the int into a java.lang.Integer.

所以在你的方法 seq 中,编译器只知道返回类型为 java.lang.Objectsum 方法的存在和叫那个.它可能没有被内联,它返回的 java.lang.Integer 必须再次拆箱,以便 seq 可以返回一个 int.在 range 中,编译器可以生成对真实"sum 方法的调用,而无需对结果进行装箱和拆箱.JVM 还可以更好地内联和优化代码.

So in your method seq the compiler only knows about the existence of the sum method that has return type java.lang.Object and calls that one. It probably doesn't get inlined and the java.lang.Integer that it returns has to be unboxed again so seq can return an int. In range the compiler can generate a call to the "real" sum method without having to box and unbox the results. The JVM can also do a better job at inlining and optimizing the code.

这篇关于具有相同运行时类但不同静态类型的对象的不同性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆