为什么Scala的尾递归慢于Java? [英] Why is Scala's tail recursion slower than that of Java?

查看:153
本文介绍了为什么Scala的尾递归慢于Java?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用尾递归进行简单加法的Scala代码

Scala code for simple addition using tail recursion

def add(list : List[Int],sum:Int):Int = {
    //Thread.dumpStack()
    if (list.isEmpty) {
        sum
    } else {
        val headVal  = list.head
        add(list.tail, sum + headVal)
    }
}

下面是递归模式下添加的java代码。

Below is the java code for addition in recursive mode.

public static int add(List<Integer> list, Integer sum) {
    // Thread.dumpStack();
    if (list.isEmpty()) {
        return sum;
    } else {
        int headVal = list.remove(0);
        return add(list, sum + headVal);
    }
}

Java版本的运行速度至少快10倍。这是1000条目。通过使用 System.nanoTime() API之前和之后的测量时间。

The Java version runs at least 10 times faster. Ran this for 1000 entries. Measured time by using System.nanoTime() API before and after.

Scala版本2.10,java版本Java 7两个运行的JVM属性相同。

Scala version 2.10, java version Java 7. Same JVM properties for both run.

推荐答案

首先Scala方法 add 不在(类)上下文中。如果在类中使用此方法,则不能应用尾递归优化,因为该方法既不是 final 也不是私人。如果添加 @tailrec ,编译将失败。如果我用10000运行它会导致堆栈溢出。

First of all the Scala method add you have shown is not in a (class) context. If you have this method in a class, the tail recursion optimization cannot be applied, since the method is neither final nor private. If you add @tailrec, compilation will fail. If I run it with 10000, it results in a stack overflow.

至于Java版本:Java版本使用的是 mutable List :头/尾分解改变基础列表。
所以在总结之后你不能再使用那个列表,因为它是空的。

As for the Java version: The Java version is using a mutable List: The head/tail decomposition alters the underlying list. So after summing you can't use that list any longer, since it is empty.

此外列表与Java List 具有完全不同的含义; Scala列表用于头/尾分解。据我所知java.util.List没有tail方法,Java编译器也没有应用tailrec优化,因此比较是不公平的。

Further the List in Scala has a completely different meaning than a Java List; the Scala list is meant for head/tail decomposition. As far as I know java.util.List has no tail method, and neither does the Java compiler apply tailrec optimizations, so the comparision is "unfair".

无论如何,我已经在不同的场景下运行了一些基于JMH的测试。

Anyway, I have run some JMH-based tests on different scenarios.

你可以真正比较的两个场景是Scala while和Java for。它们既不使用OOP也不使用函数式编程,这只是当务之急。

The only two scenarios you can really compare are "Scala while" and "Java for". They neither use OOP nor functional programming, it's just imperative.

五种不同Scala场景的结果

(请滚动到右侧,最后一栏有一个小描述)

(Please scroll to the right, there is a small description in the last column)

Benchmark                   Mode   Samples         Mean   Mean error    Units
a.b.s.b.Benchmark5a.run    thrpt        10      238,515        7,769   ops/ms Like in the question, but with @tailrec
a.b.s.b.Benchmark5b.run    thrpt        10      130,202        2,232   ops/ms Using List.sum
a.b.s.b.Benchmark5c.run    thrpt        10     2756,132       29,920   ops/ms while (no List, vars, imperative)
a.b.s.b.Benchmark5d.run    thrpt        10      237,286        2,203   ops/ms tailrec version with pattern matching
a.b.s.b.Benchmark5e.run    thrpt        10      214,719        2,483   ops/ms Like in the question (= no tailrec opt)




  • 5a和5e是这样的在问题中; 5a与 @tailrec

  • 5b: List.sum :非常慢

  • 5c:不是什么大惊喜命令式版本是最快的。

  • 5d使用模式匹配而不是如果(那将是我的风格) ,我正在添加这个来源:

    • 5a and 5e are like in the question; 5a with @tailrec.
    • 5b: List.sum: Was very slow
    • 5c: Not a big surprise, the imperative version is the fastest one.
    • 5d uses pattern matching instead of if (that would have been "my style"), I am adding the source of this one:
    • package app.benchmark.scala.benchmark5
      
      import scala.annotation._
      import org.openjdk.jmh.annotations.GenerateMicroBenchmark
      import org.openjdk.jmh.annotations.Scope
      import org.openjdk.jmh.annotations.State
      import org.openjdk.jmh.runner.Runner
      import org.openjdk.jmh.runner.RunnerException
      import org.openjdk.jmh.runner.options.Options
      import org.openjdk.jmh.runner.options.OptionsBuilder
      
      @State(Scope.Benchmark)
      object BenchmarkState5d {
        val list = List.range(1, 1000)
      }
      
      class Benchmark5d {
        private def add(list : List[Int]): Int = {
          @tailrec
          def add(list : List[Int], sum: Int): Int = {
            list match {
              case Nil =>
                sum
              case h :: t =>
                add(t, h + sum)
            }
          }
          add(list, 0)
        }
      
        @GenerateMicroBenchmark
        def run() = {
          add(BenchmarkState5d.list)
        }
      }
      

      三种Java方案

      Benchmark                   Mode   Samples         Mean   Mean error    Units
      a.b.j.b.Benchmark5a.run    thrpt        10       40,437        0,532   ops/ms mutable (rebuilds the list in each iteration)
      a.b.j.b.Benchmark5b.run    thrpt        10        0,450        0,008   ops/ms subList
      a.b.j.b.Benchmark5c.run    thrpt        10     2735,951       29,177   ops/ms for
      

      如果你真的想要比较函数式编程风格的含义(=不可变的,尾递归,头/尾分解),比Java版慢五倍。

      If you really want to compare in the meaning of functional programming style (= immutable, tail recursion, head/tail decomposition), than the Java version is five times slower.

      正如Marko Topolnik的评论所指出的那样:

      As pointed out in a comment by Marko Topolnik:


      subList 不会复制尾部,但是当它应用于 LinkedList 时,它会做一些比较糟糕的事情:它包装了原始的列出并使用偏移量来容纳语义。结果是O(n)递归算法变为O(n2)---就像重复复制尾部一样。另外,包装器会增加,因此您最终会将列表包裹一千次。绝对不能与头/尾列表相比

      subList doesn't copy the tail, but it does something comparably bad, when applied to a LinkedList: it wraps the original list and uses an offset to accomodate the semantics. The result is that an O(n) recursive algorithm becomes O(n2)---the same as if the tail was repeatedly copied. Plus, wrappers accrete so you end up with the list wrapped a thousand times. Definitely not comparable to a head/tail list



      public class Benchmark5a {
          public static int add(List<Integer> list, Integer sum) {
              if (list.isEmpty()) {
                  return sum;
              } else {
                  int headVal = list.remove(0);
                  return add(list, sum + headVal);
              }
          }
      
          @GenerateMicroBenchmark
          public long run() {
              final List<Integer> list = new LinkedList<Integer>();
              for(int i = 0; i < 1000; i++) {
                  list.add(i);
              }
              return add(list, 0);
          }
      
          public static void main(String[] args) {
              System.out.println(new Benchmark5a().run());
          }
      }
      







      @State(Scope.Benchmark)
      class BenchmarkState5b {
          public final static List<Integer> list = new LinkedList<Integer>();
      
          static {
              for(int i = 0; i < 1000; i++) {
                  list.add(i);
              }
          }
      }
      
      public class Benchmark5b {
          public static int add(List<Integer> list, int sum) {
              if (list.isEmpty()) {
                  return sum;
              } else {
                  int headVal = list.get(0);
                  return add(list.subList(1, list.size()), sum + headVal);
              }
          }
      
          @GenerateMicroBenchmark
          public long run() {
              return add(BenchmarkState5b.list, 0);
          }
      
          public static void main(String[] args) {
              System.out.println(new Benchmark5b().run());
          }
      }
      






      Scala结果详情

      (所有结果仅显示最后一个方案,以及整体结果)

      (all results only show the last scenario, and the overall results)

      [...]
      
      # VM invoker: /home/oracle-jdk-1.8-8u40/data/oracle-jdk-1.8.0_40/jre/bin/java
      # VM options: <none>
      # Fork: 1 of 1
      # Warmup: 3 iterations, 1 s each
      # Measurement: 10 iterations, 1 s each
      # Threads: 1 thread, will synchronize iterations
      # Benchmark mode: Throughput, ops/time
      # Benchmark: app.benchmark.scala.benchmark5.Benchmark5e.run
      # Warmup Iteration   1: 166,153 ops/ms
      # Warmup Iteration   2: 215,242 ops/ms
      # Warmup Iteration   3: 216,632 ops/ms
      Iteration   1: 215,526 ops/ms
      Iteration   2: 213,720 ops/ms
      Iteration   3: 213,967 ops/ms
      Iteration   4: 215,468 ops/ms
      Iteration   5: 216,247 ops/ms
      Iteration   6: 217,514 ops/ms
      Iteration   7: 215,503 ops/ms
      Iteration   8: 211,969 ops/ms
      Iteration   9: 212,989 ops/ms
      Iteration  10: 214,291 ops/ms
      
      Result : 214,719 ±(99.9%) 2,483 ops/ms
        Statistics: (min, avg, max) = (211,969, 214,719, 217,514), stdev = 1,642
        Confidence interval (99.9%): [212,236, 217,202]
      
      
      Benchmark                   Mode   Samples         Mean   Mean error    Units
      a.b.s.b.Benchmark5a.run    thrpt        10      238,515        7,769   ops/ms
      a.b.s.b.Benchmark5b.run    thrpt        10      130,202        2,232   ops/ms
      a.b.s.b.Benchmark5c.run    thrpt        10     2756,132       29,920   ops/ms
      a.b.s.b.Benchmark5d.run    thrpt        10      237,286        2,203   ops/ms
      a.b.s.b.Benchmark5e.run    thrpt        10      214,719        2,483   ops/ms
      






      Java结果详情

      # VM invoker: /home/oracle-jdk-1.8-8u40/data/oracle-jdk-1.8.0_40/jre/bin/java
      # VM options: <none>
      # Fork: 1 of 1
      # Warmup: 3 iterations, 1 s each
      # Measurement: 10 iterations, 1 s each
      # Threads: 1 thread, will synchronize iterations
      # Benchmark mode: Throughput, ops/time
      # Benchmark: app.benchmark.java.benchmark5.Benchmark5c.run
      # Warmup Iteration   1: 2777,495 ops/ms
      # Warmup Iteration   2: 2888,040 ops/ms
      # Warmup Iteration   3: 2692,851 ops/ms
      Iteration   1: 2737,169 ops/ms
      Iteration   2: 2745,368 ops/ms
      Iteration   3: 2754,105 ops/ms
      Iteration   4: 2706,131 ops/ms
      Iteration   5: 2721,593 ops/ms
      Iteration   6: 2769,261 ops/ms
      Iteration   7: 2734,461 ops/ms
      Iteration   8: 2741,494 ops/ms
      Iteration   9: 2740,012 ops/ms
      Iteration  10: 2709,915 ops/ms
      
      Result : 2735,951 ±(99.9%) 29,177 ops/ms
        Statistics: (min, avg, max) = (2706,131, 2735,951, 2769,261), stdev = 19,299
        Confidence interval (99.9%): [2706,774, 2765,128]
      
      
      Benchmark                   Mode   Samples         Mean   Mean error    Units
      a.b.j.b.Benchmark5a.run    thrpt        10       40,437        0,532   ops/ms
      a.b.j.b.Benchmark5b.run    thrpt        10        0,450        0,008   ops/ms
      a.b.j.b.Benchmark5c.run    thrpt        10     2735,951       29,177   ops/ms
      






      更新:添加了另一个Java场景5d使用 ArrayList


      Update: Added another Java scenario 5d using ArrayList

      Benchmark                   Mode   Samples         Mean   Mean error    Units
      a.b.j.b.Benchmark5a.run    thrpt        10       34,931        0,504   ops/ms
      a.b.j.b.Benchmark5b.run    thrpt        10        0,430        0,005   ops/ms
      a.b.j.b.Benchmark5c.run    thrpt        10     2610,085        9,664   ops/ms
      a.b.j.b.Benchmark5d.run    thrpt        10       56,693        1,218   ops/ms
      

      这篇关于为什么Scala的尾递归慢于Java?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆