通过链式操作快速降低流吞吐量? [英] Quickly degrading stream throughput with chained operations?
问题描述
我希望简单的中间流操作(例如limit()
)的开销很小.但是这些示例之间的吞吐量差异实际上是很明显的:
I expected that simple intermediate stream operations, such as limit()
, have very little overhead. But the difference in throughput between these examples is actually significant:
final long MAX = 5_000_000_000L;
LongStream.rangeClosed(0, MAX)
.count();
// throughput: 1.7 bn values/second
LongStream.rangeClosed(0, MAX)
.limit(MAX)
.count();
// throughput: 780m values/second
LongStream.rangeClosed(0, MAX)
.limit(MAX)
.limit(MAX)
.count();
// throughput: 130m values/second
LongStream.rangeClosed(0, MAX)
.limit(MAX)
.limit(MAX)
.limit(MAX)
.count();
// throughput: 65m values/second
我很好奇:吞吐量快速下降的原因是什么?它是与链式流操作或测试设置保持一致的模式吗? (到目前为止,我还没有使用JMH,只是用秒表进行了快速实验)
I am curious: What is the reason for the quickly degrading throughput? Is it a consistent pattern with chained stream operations or my test setup? (I did not use JMH so far, just set up a quick experiment with a stopwatch)
推荐答案
limit
将导致使用流 slice 和 split迭代器(用于并行操作).一言以蔽之:效率低下.无人操作的开销很大.而且两个连续的limit
调用导致两个切片,这实在令人遗憾.
limit
will result in a slice being made of the stream, with a split iterator (for parallel operation). In one word: inefficient. A large overhead for a no-op here. And that two consecutive limit
calls result in two slices is a shame.
您应该看一下IntStream.limit
的实现.
由于Streams仍然是相对较新的,因此优化应该排在最后;当生产代码存在时.进行3次极限似乎有些牵强.
As Streams are still relative new, optimization should come last; when production code exists. Doing limit 3 times seems a bit far-fetched.
这篇关于通过链式操作快速降低流吞吐量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!