Scala的隐藏性能成本? [英] Hidden performance cost in Scala?

查看:103
本文介绍了Scala的隐藏性能成本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了这个旧问题和用scala 2.10.3进行了以下实验。

I came across this old question and did the following experiment with scala 2.10.3.

我重写了Scala版本以使用显式尾递归:

I rewrote the Scala version to use explicit tail recursion:

import scala.annotation.tailrec

object ScalaMain {
  private val t = 20

  private def run() {
    var i = 10
    while(!isEvenlyDivisible(2, i, t))
      i += 2
    println(i)
  }

  @tailrec private def isEvenlyDivisible(i: Int, a: Int, b: Int): Boolean = {
    if (i > b) true
    else (a % i == 0) && isEvenlyDivisible(i+1, a, b)
  }

  def main(args: Array[String]) {
    val t1 = System.currentTimeMillis()
    var i = 0
    while (i < 20) {
      run()
      i += 1
    }
    val t2 = System.currentTimeMillis()
    println("time: " + (t2 - t1))
  }
}

并将其与以下Java版本进行比较。为了与Scala公平比较,我有意识地使函数非静态:

and compared it to the following Java version. I consciously made the functions non-static for a fair comparison with Scala:

public class JavaMain {
    private final int t = 20;

    private void run() {
        int i = 10;
        while (!isEvenlyDivisible(2, i, t))
            i += 2;
        System.out.println(i);
    }

    private boolean isEvenlyDivisible(int i, int a, int b) {
        if (i > b) return true;
        else return (a % i == 0) && isEvenlyDivisible(i+1, a, b);
    }

    public static void main(String[] args) {
        JavaMain o = new JavaMain();
        long t1 = System.currentTimeMillis();
        for (int i = 0; i < 20; ++i)
          o.run();
        long t2 = System.currentTimeMillis();
        System.out.println("time: " + (t2 - t1));
    }
}

以下是我计算机上的结果:

Here are the results on my computer:

> java JavaMain
....
time: 9651
> scala ScalaMain
....
time: 20592

这是scala 2.10 .3 on(Java HotSpot(TM)64位服务器VM,Java 1.7.0_51)。

This is scala 2.10.3 on (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51).

我的问题是scala版本的隐藏成本是多少?

My question is what is the hidden cost with the scala version?

非常感谢。

推荐答案

嗯,OP的基准测试不是理想的一。需要减少许多影响,包括预热,消除死代码,分叉等等。幸运的是, JMH 已经处理了很多事情,并且对Java和Scala都有绑定。请按照JMH页面上的程序获取基准项目,然后你可以移植下面的基准测试。

Well, OP's benchmarking is not the ideal one. Tons of effects need to be mitigated, including warmup, dead code elimination, forking, etc. Luckily, JMH already takes care of many things, and has bindings for both Java and Scala. Please follow the procedures on JMH page to get the benchmark project, then you can transplant the benchmarks below there.

这是Java基准测试示例:

This is the sample Java benchmark:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
@Fork(3)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
public class JavaBench {

    @Param({"1", "5", "10", "15", "20"})
    int t;

    private int run() {
        int i = 10;
        while(!isEvenlyDivisible(2, i, t))
            i += 2;
        return i;
    }

    private boolean isEvenlyDivisible(int i, int a, int b) {
        if (i > b)
            return true;
        else
            return (a % i == 0) && isEvenlyDivisible(i + 1, a, b);
    }

    @GenerateMicroBenchmark
    public int test() {
        return run();
    }

}

...这就是示例Scala基准:

...and this is the sample Scala benchmark:

@BenchmarkMode(Array(Mode.AverageTime))
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
@Fork(3)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
class ScalaBench {

  @Param(Array("1", "5", "10", "15", "20"))
  var t: Int = _

  private def run(): Int = {
    var i = 10
    while(!isEvenlyDivisible(2, i, t))
      i += 2
    i
  }

  @tailrec private def isEvenlyDivisible(i: Int, a: Int, b: Int): Boolean = {
    if (i > b) true
    else (a % i == 0) && isEvenlyDivisible(i + 1, a, b)
  }

  @GenerateMicroBenchmark
  def test(): Int = {
    run()
  }

}

如果你在JDK 8 GA,Linux x86_64上运行这些,那么你会得到:

If you run these on JDK 8 GA, Linux x86_64, then you'll get:

Benchmark             (t)   Mode   Samples         Mean   Mean error    Units
o.s.ScalaBench.test     1   avgt        15        0.005        0.000    us/op
o.s.ScalaBench.test     5   avgt        15        0.489        0.001    us/op
o.s.ScalaBench.test    10   avgt        15       23.672        0.087    us/op
o.s.ScalaBench.test    15   avgt        15     3406.492        9.239    us/op
o.s.ScalaBench.test    20   avgt        15  2483221.694     5973.236    us/op

Benchmark            (t)   Mode   Samples         Mean   Mean error    Units
o.s.JavaBench.test     1   avgt        15        0.002        0.000    us/op
o.s.JavaBench.test     5   avgt        15        0.254        0.007    us/op
o.s.JavaBench.test    10   avgt        15       12.578        0.098    us/op
o.s.JavaBench.test    15   avgt        15     1628.694       11.282    us/op
o.s.JavaBench.test    20   avgt        15  1066113.157    11274.385    us/op

请注意我们玩弄 t 以查看是否对于 t 的特定值,效果是本地的。它不是,效果是系统的,Java版本的速度是其两倍。

Notice we juggle t to see if the effect is local for the particular value of t. It is not, the effect is systematic, and Java version being twice as fast.

PrintAssembly 将放弃一些对此有所了解。这是Scala基准测试中最热门的一块:

PrintAssembly will shed some light on this. This one is the hottest block in Scala benchmark:

0x00007fe759199d42: test   %r8d,%r8d
0x00007fe759199d45: je     0x00007fe759199d76  ;*irem
                                               ; - org.sample.ScalaBench::isEvenlyDivisible@11 (line 52)
                                               ; - org.sample.ScalaBench::run@10 (line 45)
0x00007fe759199d47: mov    %ecx,%eax
0x00007fe759199d49: cmp    $0x80000000,%eax
0x00007fe759199d4e: jne    0x00007fe759199d58
0x00007fe759199d50: xor    %edx,%edx
0x00007fe759199d52: cmp    $0xffffffffffffffff,%r8d
0x00007fe759199d56: je     0x00007fe759199d5c
0x00007fe759199d58: cltd   
0x00007fe759199d59: idiv   %r8d

...这是Java中类似的块:

...and this is similar block in Java:

0x00007f4a811848cf: movslq %ebp,%r10
0x00007f4a811848d2: mov    %ebp,%r9d
0x00007f4a811848d5: sar    $0x1f,%r9d
0x00007f4a811848d9: imul   $0x55555556,%r10,%r10
0x00007f4a811848e0: sar    $0x20,%r10
0x00007f4a811848e4: mov    %r10d,%r11d
0x00007f4a811848e7: sub    %r9d,%r11d         ;*irem
                                              ; - org.sample.JavaBench::isEvenlyDivisible@9 (line 63)
                                              ; - org.sample.JavaBench::isEvenlyDivisible@19 (line 63)
                                              ; - org.sample.JavaBench::run@10 (line 54)

请注意Java版本中的编译器使用了将整数余数计算转换为乘法和右移的技巧(参见Hacker's Delight,Ch.10,Sect.19)。当编译器检测到我们根据常量计算余数时,这是可能的,这表明Java版本达到了甜蜜的优化,但Scala版本却没有。你可以深入研究字节码反汇编来找出scalac干预了什么怪癖,但这个练习的重点在于代码生成中令人惊讶的细微差别被基准放大了很多。

Notice how in Java version the compiler employed the trick for translating integer remainder calculation into the multiplication and shifting right (see Hacker's Delight, Ch. 10, Sect. 19). This is possible when compiler detects we compute the remainder against the constant, which suggests Java version hit that sweet optimization, but Scala version did not. You can dig into the bytecode disassembly to figure out what quirk in scalac have intervened, but the point of this exercise is that surprising minute differences in code generation are magnified by benchmarks a lot.

PS非常适合 @tailrec ...

P.S. So much for @tailrec...

更新:对效果的更详尽说明: http://shipilev.net/blog/2014/java-scala-divided-we-失败/

UPDATE: A more thorough explanation of the effect: http://shipilev.net/blog/2014/java-scala-divided-we-fail/

这篇关于Scala的隐藏性能成本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆