为什么lambda IntStream.anyMatch()比天真的实现慢10? [英] Why lambda IntStream.anyMatch() is 10 slower than naive implementation?
问题描述
我最近在分析我的代码并发现了一个有趣的瓶颈。以下是基准:
I was recently profiling my code and found one interesting bottleneck in it. Here is benchmark :
@BenchmarkMode(Mode.Throughput)
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
public class Contains {
private int[] ar = new int[] {1,2,3,4,5,6,7};
private int val = 5;
@Benchmark
public boolean naive() {
return contains(ar, val);
}
@Benchmark
public boolean lambdaArrayStreamContains() {
return Arrays.stream(ar).anyMatch(i -> i == val);
}
@Benchmark
public boolean lambdaIntStreamContains() {
return IntStream.of(ar).anyMatch(i -> i == val);
}
private static boolean contains(int[] ar, int value) {
for (int arVal : ar) {
if (arVal == value) {
return true;
}
}
return false;
}
}
结果:
Benchmark Mode Cnt Score Error Units
Contains.lambdaArrayStreamContains thrpt 10 22867.962 ± 1049.649 ops/s
Contains.lambdaIntStreamContains thrpt 10 22983.800 ± 593.580 ops/s
Contains.naive thrpt 10 228002.406 ± 8591.186 ops/s
如果显示Array包含通过lambda的操作比使用简单循环的naive实现慢10倍。我知道lambdas应该慢一点。但是10次?我做错了lambda或者这是java的一些问题吗?
If shows that Array contains operation via lambda is 10 times slower than naive implementation with simple loop. I knew that lambdas should be a bit slower. But 10 times? Am I doing wrong lambda or this is some issue with java?
推荐答案
你的基准测试实际上没有测量 anyMatch
性能,而不是流开销。与一个非常简单的操作(如五元素数组查找)相比,这种开销可能会显得很大。
Your benchmark does not actually measure anyMatch
performance, but rather a stream overhead. This overhead can appear significant when comparing to a very simple operation like a five-element array lookup.
如果我们从相对数字到绝对数字进行切换,那么减速看起来并不那么糟糕。让我们测量延迟而不是吞吐量,以获得更清晰的图像。我省略了 lambdaIntStream
基准,因为它与 lambdaArrayStream
完全相同。
The slowdown will not look so terrible if we swtich from relative to absolute numbers. Let's measure latency instead of throughput for a clearer picture. I've omitted lambdaIntStream
benchmark since it works exactly the same way as lambdaArrayStream
.
Benchmark Mode Cnt Score Error Units
Contains.lambdaArrayStream avgt 5 53,242 ± 2,034 ns/op
Contains.naive avgt 5 5,876 ± 0,404 ns/op
5.8 ns大约是2.4 GHz CPU的14个周期。工作量非常小,任何额外的周期都会很明显。那么流操作的开销是多少?
5.8 ns is roughly 14 cycles of a 2.4 GHz CPU. The workload is so small that any extra cycle will be noticeable. So what is the overhead of stream operations?
现在用<$ c $重新运行基准测试c> -prof gc profiler。它将显示堆分配的数量:
Now rerun the benchmark with -prof gc
profiler. It will show the amount of heap allocations:
Benchmark Mode Cnt Score Error Units
Contains.lambdaArrayStream:·gc.alloc.rate.norm avgt 5 152,000 ± 0,001 B/op
Contains.naive:·gc.alloc.rate.norm avgt 5 ≈ 10⁻⁵ B/op
lambdaArrayStream
每次迭代分配152个字节,而天真
基准没有任何分配。当然,分配不是免费的:至少有5个对象构造成支持 anyMatch
,每个对象需要几纳秒:
lambdaArrayStream
allocates 152 bytes per iteration while naive
benchmark allocates nothing. Of course, allocation is not free: there are at least 5 objects constructed to support anyMatch
, and each takes several nanoseconds:
- Lambda
i - > i == val
- IntPipeline.Head
- Spliterators.IntArraySpliterator
- MatchOps.MatchOp
- MatchOps.MatchSink
- Lambda
i -> i == val
- IntPipeline.Head
- Spliterators.IntArraySpliterator
- MatchOps.MatchOp
- MatchOps.MatchSink
java.util.stream
实现有点复杂,因为它必须支持流源,中间和终端操作的所有组合。如果您在基准测试中查看 anyMatch
的调用堆栈,您会看到类似的内容:
java.util.stream
implementation is a bit complicated since it must support all combinations of stream sources, intermediate and terminal operations. If you look at the call stack of anyMatch
in your benchmark, you'll see something like that:
at bench.Contains.lambda$lambdaArrayStream$0(Contains.java:24)
at java.util.stream.MatchOps$2MatchSink.accept(MatchOps.java:119)
at java.util.Spliterators$IntArraySpliterator.tryAdvance(Spliterators.java:1041)
at java.util.stream.IntPipeline.forEachWithCancel(IntPipeline.java:162)
at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.IntPipeline.anyMatch(IntPipeline.java:477)
at bench.Contains.lambdaArrayStream(Contains.java:23)
并非所有这些方法调用都可以内联。此外,JVM将内联限制为9个级别,但在这里我们看到更深层次的调用堆栈。如果我们用 -XX:MaxInlineLevel = 20
覆盖限制,则分数会变得更好:
Not all of these method calls can be inlined. Furthermore, JVM limits inlining to 9 levels, but here we see the deeper call stack. If we override the limit with -XX:MaxInlineLevel=20
the score will become a bit better:
Benchmark Mode Cnt Score Error Units
Contains.lambdaArrayStream avgt 5 33,294 ± 0,367 ns/op (was 53,242)
Contains.naive avgt 5 5,822 ± 0,207 ns/op
循环优化
for
对数组进行迭代是一个微不足道的计数循环。 JVM可以在这里应用各种循环优化:循环剥离,循环展开等。这不适用于而
-kind循环在 forEachWithCancel
方法,用于遍历IntStream。循环优化的效果可以用 -XX来衡量:LoopUnrollLimit = 0 -XX:-UseLoopPredicate
:
Loop optimizations
for
iteration over an array is a trivial counted loop. JVM can apply a wide range of loop optimizatons here: loop peeling, loop unrolling etc. This does not work for a while
-kind loop in forEachWithCancel
method, which is used to traverse an IntStream. The effect of loop optimizations can be measured with -XX:LoopUnrollLimit=0 -XX:-UseLoopPredicate
:
Benchmark Mode Cnt Score Error Units
Contains.lambdaArrayStream avgt 5 33,153 ± 0,559 ns/op
Contains.naive avgt 5 9,853 ± 0,150 ns/op (was 5,876)
结论
是构造和遍历流的一些开销,但这是完全理解的,不能被视为错误。我不会说开销很大(甚至50 ns / op也不是那么多);但是,在这个特定的例子中,由于工作量极小,开销占主导地位。
Conclusions
There is some overhead to construct and to traverse a stream, but this is completely understood and cannot be considered a bug. I would not say the overhead is big (even 50 ns/op is not that much); however, in this particular example the overhead is dominant because of an extremely tiny workload.
这篇关于为什么lambda IntStream.anyMatch()比天真的实现慢10?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!