为什么Scala的尾递归慢于Java? [英] Why is Scala's tail recursion slower than that of Java?
问题描述
使用尾递归进行简单加法的Scala代码
Scala code for simple addition using tail recursion
def add(list : List[Int],sum:Int):Int = {
//Thread.dumpStack()
if (list.isEmpty) {
sum
} else {
val headVal = list.head
add(list.tail, sum + headVal)
}
}
下面是递归模式下添加的java代码。
Below is the java code for addition in recursive mode.
public static int add(List<Integer> list, Integer sum) {
// Thread.dumpStack();
if (list.isEmpty()) {
return sum;
} else {
int headVal = list.remove(0);
return add(list, sum + headVal);
}
}
Java版本的运行速度至少快10倍。这是1000条目。通过使用 System.nanoTime()
API之前和之后的测量时间。
The Java version runs at least 10 times faster. Ran this for 1000 entries. Measured time by using System.nanoTime()
API before and after.
Scala版本2.10,java版本Java 7两个运行的JVM属性相同。
Scala version 2.10, java version Java 7. Same JVM properties for both run.
推荐答案
首先Scala方法 add $ c您显示的$ c>不在(类)上下文中。如果在类中使用此方法,则不能应用尾递归优化,因为该方法既不是
final
也不是私人
。如果添加 @tailrec
,编译将失败。如果我用10000运行它会导致堆栈溢出。
First of all the Scala method add
you have shown is not in a (class) context. If you have this method in a class, the tail recursion optimization cannot be applied, since the method is neither final
nor private
. If you add @tailrec
, compilation will fail. If I run it with 10000, it results in a stack overflow.
至于Java版本:Java版本使用的是 mutable List
:头/尾分解改变基础列表。
所以在总结之后你不能再使用那个列表,因为它是空的。
As for the Java version: The Java version is using a mutable List
: The head/tail decomposition alters the underlying list.
So after summing you can't use that list any longer, since it is empty.
此外列表$ c Scala中的$ c>与Java
List
具有完全不同的含义; Scala列表用于头/尾分解。据我所知java.util.List没有tail方法,Java编译器也没有应用tailrec优化,因此比较是不公平的。
Further the List
in Scala has a completely different meaning than a Java List
; the Scala list is meant for head/tail decomposition. As far as I know java.util.List has no tail method, and neither does the Java compiler apply tailrec optimizations, so the comparision is "unfair".
无论如何,我已经在不同的场景下运行了一些基于JMH的测试。
Anyway, I have run some JMH-based tests on different scenarios.
你可以真正比较的两个场景是Scala while和Java for。它们既不使用OOP也不使用函数式编程,这只是当务之急。
The only two scenarios you can really compare are "Scala while" and "Java for". They neither use OOP nor functional programming, it's just imperative.
五种不同Scala场景的结果
(请滚动到右侧,最后一栏有一个小描述)
(Please scroll to the right, there is a small description in the last column)
Benchmark Mode Samples Mean Mean error Units
a.b.s.b.Benchmark5a.run thrpt 10 238,515 7,769 ops/ms Like in the question, but with @tailrec
a.b.s.b.Benchmark5b.run thrpt 10 130,202 2,232 ops/ms Using List.sum
a.b.s.b.Benchmark5c.run thrpt 10 2756,132 29,920 ops/ms while (no List, vars, imperative)
a.b.s.b.Benchmark5d.run thrpt 10 237,286 2,203 ops/ms tailrec version with pattern matching
a.b.s.b.Benchmark5e.run thrpt 10 214,719 2,483 ops/ms Like in the question (= no tailrec opt)
- 5a和5e是这样的在问题中; 5a与
@tailrec
。 - 5b:
List.sum
:非常慢 - 5c:不是什么大惊喜命令式版本是最快的。
- 5d使用模式匹配而不是
如果
(那将是我的风格) ,我正在添加这个来源: - 5a and 5e are like in the question; 5a with
@tailrec
. - 5b:
List.sum
: Was very slow - 5c: Not a big surprise, the imperative version is the fastest one.
- 5d uses pattern matching instead of
if
(that would have been "my style"), I am adding the source of this one:
package app.benchmark.scala.benchmark5
import scala.annotation._
import org.openjdk.jmh.annotations.GenerateMicroBenchmark
import org.openjdk.jmh.annotations.Scope
import org.openjdk.jmh.annotations.State
import org.openjdk.jmh.runner.Runner
import org.openjdk.jmh.runner.RunnerException
import org.openjdk.jmh.runner.options.Options
import org.openjdk.jmh.runner.options.OptionsBuilder
@State(Scope.Benchmark)
object BenchmarkState5d {
val list = List.range(1, 1000)
}
class Benchmark5d {
private def add(list : List[Int]): Int = {
@tailrec
def add(list : List[Int], sum: Int): Int = {
list match {
case Nil =>
sum
case h :: t =>
add(t, h + sum)
}
}
add(list, 0)
}
@GenerateMicroBenchmark
def run() = {
add(BenchmarkState5d.list)
}
}
三种Java方案
Benchmark Mode Samples Mean Mean error Units
a.b.j.b.Benchmark5a.run thrpt 10 40,437 0,532 ops/ms mutable (rebuilds the list in each iteration)
a.b.j.b.Benchmark5b.run thrpt 10 0,450 0,008 ops/ms subList
a.b.j.b.Benchmark5c.run thrpt 10 2735,951 29,177 ops/ms for
如果你真的想要比较函数式编程风格的含义(=不可变的,尾递归,头/尾分解),比Java版慢五倍。
If you really want to compare in the meaning of functional programming style (= immutable, tail recursion, head/tail decomposition), than the Java version is five times slower.
正如Marko Topolnik的评论所指出的那样:
As pointed out in a comment by Marko Topolnik:
subList
不会复制尾部,但是当它应用于LinkedList
时,它会做一些比较糟糕的事情:它包装了原始的列出并使用偏移量来容纳语义。结果是O(n)递归算法变为O(n2)---就像重复复制尾部一样。另外,包装器会增加,因此您最终会将列表包裹一千次。绝对不能与头/尾列表相比
subList
doesn't copy the tail, but it does something comparably bad, when applied to aLinkedList
: it wraps the original list and uses an offset to accomodate the semantics. The result is that an O(n) recursive algorithm becomes O(n2)---the same as if the tail was repeatedly copied. Plus, wrappers accrete so you end up with the list wrapped a thousand times. Definitely not comparable to a head/tail list
public class Benchmark5a {
public static int add(List<Integer> list, Integer sum) {
if (list.isEmpty()) {
return sum;
} else {
int headVal = list.remove(0);
return add(list, sum + headVal);
}
}
@GenerateMicroBenchmark
public long run() {
final List<Integer> list = new LinkedList<Integer>();
for(int i = 0; i < 1000; i++) {
list.add(i);
}
return add(list, 0);
}
public static void main(String[] args) {
System.out.println(new Benchmark5a().run());
}
}
@State(Scope.Benchmark)
class BenchmarkState5b {
public final static List<Integer> list = new LinkedList<Integer>();
static {
for(int i = 0; i < 1000; i++) {
list.add(i);
}
}
}
public class Benchmark5b {
public static int add(List<Integer> list, int sum) {
if (list.isEmpty()) {
return sum;
} else {
int headVal = list.get(0);
return add(list.subList(1, list.size()), sum + headVal);
}
}
@GenerateMicroBenchmark
public long run() {
return add(BenchmarkState5b.list, 0);
}
public static void main(String[] args) {
System.out.println(new Benchmark5b().run());
}
}
Scala结果详情
(所有结果仅显示最后一个方案,以及整体结果)
(all results only show the last scenario, and the overall results)
[...]
# VM invoker: /home/oracle-jdk-1.8-8u40/data/oracle-jdk-1.8.0_40/jre/bin/java
# VM options: <none>
# Fork: 1 of 1
# Warmup: 3 iterations, 1 s each
# Measurement: 10 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: app.benchmark.scala.benchmark5.Benchmark5e.run
# Warmup Iteration 1: 166,153 ops/ms
# Warmup Iteration 2: 215,242 ops/ms
# Warmup Iteration 3: 216,632 ops/ms
Iteration 1: 215,526 ops/ms
Iteration 2: 213,720 ops/ms
Iteration 3: 213,967 ops/ms
Iteration 4: 215,468 ops/ms
Iteration 5: 216,247 ops/ms
Iteration 6: 217,514 ops/ms
Iteration 7: 215,503 ops/ms
Iteration 8: 211,969 ops/ms
Iteration 9: 212,989 ops/ms
Iteration 10: 214,291 ops/ms
Result : 214,719 ±(99.9%) 2,483 ops/ms
Statistics: (min, avg, max) = (211,969, 214,719, 217,514), stdev = 1,642
Confidence interval (99.9%): [212,236, 217,202]
Benchmark Mode Samples Mean Mean error Units
a.b.s.b.Benchmark5a.run thrpt 10 238,515 7,769 ops/ms
a.b.s.b.Benchmark5b.run thrpt 10 130,202 2,232 ops/ms
a.b.s.b.Benchmark5c.run thrpt 10 2756,132 29,920 ops/ms
a.b.s.b.Benchmark5d.run thrpt 10 237,286 2,203 ops/ms
a.b.s.b.Benchmark5e.run thrpt 10 214,719 2,483 ops/ms
Java结果详情
# VM invoker: /home/oracle-jdk-1.8-8u40/data/oracle-jdk-1.8.0_40/jre/bin/java
# VM options: <none>
# Fork: 1 of 1
# Warmup: 3 iterations, 1 s each
# Measurement: 10 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: app.benchmark.java.benchmark5.Benchmark5c.run
# Warmup Iteration 1: 2777,495 ops/ms
# Warmup Iteration 2: 2888,040 ops/ms
# Warmup Iteration 3: 2692,851 ops/ms
Iteration 1: 2737,169 ops/ms
Iteration 2: 2745,368 ops/ms
Iteration 3: 2754,105 ops/ms
Iteration 4: 2706,131 ops/ms
Iteration 5: 2721,593 ops/ms
Iteration 6: 2769,261 ops/ms
Iteration 7: 2734,461 ops/ms
Iteration 8: 2741,494 ops/ms
Iteration 9: 2740,012 ops/ms
Iteration 10: 2709,915 ops/ms
Result : 2735,951 ±(99.9%) 29,177 ops/ms
Statistics: (min, avg, max) = (2706,131, 2735,951, 2769,261), stdev = 19,299
Confidence interval (99.9%): [2706,774, 2765,128]
Benchmark Mode Samples Mean Mean error Units
a.b.j.b.Benchmark5a.run thrpt 10 40,437 0,532 ops/ms
a.b.j.b.Benchmark5b.run thrpt 10 0,450 0,008 ops/ms
a.b.j.b.Benchmark5c.run thrpt 10 2735,951 29,177 ops/ms
更新:添加了另一个Java场景5d使用 ArrayList
Update: Added another Java scenario 5d using ArrayList
Benchmark Mode Samples Mean Mean error Units
a.b.j.b.Benchmark5a.run thrpt 10 34,931 0,504 ops/ms
a.b.j.b.Benchmark5b.run thrpt 10 0,430 0,005 ops/ms
a.b.j.b.Benchmark5c.run thrpt 10 2610,085 9,664 ops/ms
a.b.j.b.Benchmark5d.run thrpt 10 56,693 1,218 ops/ms
这篇关于为什么Scala的尾递归慢于Java?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!