Java 8:第一次使用stream()或parallelStream()的速度很慢-实际使用中有意义吗? [英] Java 8: First use of stream() or parallelStream() very slow - Usage in practice meaningful?

查看:320
本文介绍了Java 8:第一次使用stream()或parallelStream()的速度很慢-实际使用中有意义吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近几天,我对Java 8中的外部迭代,流和parallelStreams进行了一些测试,并测量了执行时间.我还阅读了必须考虑的热身时间.但是仍然有一个问题.

In the last few days I made some test with external iteration, streams and parallelStreams in Java 8 and measured the duration of the execution time. I also read about the warm up time which I have to consider. But one question still remains.

当我第一次在集合上调用方法stream()parallelStream()时,执行时间比外部迭代要长.我已经知道,当我在同一集合上更频繁地调用stream()parallelStream()并平均执行时间时,parallelStream()的确比外部迭代要快.但是由于实际上一个集合通常也只迭代一次,所以我只看到使用流或并行流的缺点.

The first time when I call the method stream() or parallelStream() on a collection the execution time is higher than it is for an external iteration. I already know, that when I call the stream() or parallelStream() more often on the same collection and avarage the execution time, then the parallelStream() is indeed faster than the external iteration. But since in practice a collection is also often only iterate once, I only see an disadvantage in using streams or parallelstreams.

所以我的问题是:

如果我只迭代一次集合,那么使用stream或parallelStream()是个好主意,还是执行时间总是比外部迭代长?

If I only iterate an collection once, is it a good idea to use stream or parallelStream() or will the execution time always be higher than for external iteration?

推荐答案

Doug Lea,Brian Goetz和其他几个人完全巧合地(显然)写了一个名为

Entirely coincidentally (apparently), Doug Lea, Brian Goetz, and several other folks have written a document called Stream Parallel Guidance. (This is only a draft.) It does have some useful discussion about when to use parallel vs. sequential streams.

简短摘要:并行流启动要比顺序流更昂贵.如果您的工作负载是可拆分的,并且您可以使用多个CPU内核承担该问题,并且如果每个元素的成本不是不合理的话,那么在工作负载足够大的情况下,您将获得并行加速. (很多条件如何?)哦,您还必须注意基准测试.

A brief summary: a parallel stream is more expensive to start up than a sequential stream. If your workload is splittable, and you have multiple CPU cores that can be brought to bear on the problem, and if the per-element cost isn't unreasonably small, you'll get a parallel speedup with a sufficiently large workload. (How's that for a lot of conditionals?) Oh, and you also have to be careful about benchmarking.

StackOverflow到处都是问题,试图并行添加几个整数,然后声称并行流不好,因为它们不提供任何加速.我什至不愿意链接到它们.

StackOverflow is littered with questions that attempt to add up a few integers in parallel and then claim that parallel streams are no good because they don't provide any speedup. I won't even bother linking to them.

现在,您已经问过外部迭代"(基本上是for循环)与并行或顺序流之间的关系.我认为,如上所述,考虑并行流与顺序流很重要.这将有助于为进一步的决策提供依据.显然,如果有可能需要并行运行,那么即使最初按顺序启动,也应该使用流.

Now, you had asked about "external iteration" (basically a for-loop) vs streams, parallel or sequential. I think it's important consider parallel vs sequential streams, as I've done above. This will help inform further decisions. Clearly, if there is a possibility you'll need to run things in parallel, then you should probably go with streams, even if you initially start sequentially.

即使您不打算并行处理,在for循环和顺序流之间仍然有许多注意事项.与常规循环相比,流有一定的开销-尤其是数组上的for循环.但这通常在工作量上摊销.即使仅对集合进行一次迭代,但是如果集合中的元素数足够大,则可能会摊销设置.例如,如果集合有10个元素,则流的额外设置成本可能不值得.如果该集合包含10,000个元素,那么故事可能就不同了.

Even if you don't intend to go parallel, there are still a number of considerations between for-loops and sequential streams. There is a certain amount of overhead of streams compared to conventional loops -- especially for-loops over an array. But this is usually amortized over the workload. Even if the collection is iterated only once, amortization of the setup can occur if the number of elements in the collection is sufficiently large. For example, if the collection has 10 elements, the extra setup cost of a stream probably isn't worth it. If the collection has 10,000 elements, it might be a different story.

在数组上进行循环特别快,因为唯一的设置"是初始化循环计数器和寄存器中的极限值. JIT编译器还可以带来许多循环优化.顺序流很少会在数组上进行for循环,尽管这种情况可能发生.

For-loops over arrays are particularly fast because the only "setup" is initializing loop counters and limit values in registers. JIT compilers can bring many loop optimizations to bear as well. It's rare for sequential streams to beat a for-loop over an array, though it can happen.

集合上的For循环通常涉及创建迭代器,因此比基于数组的循环具有更多的开销.特别是,迭代器上的每次迭代都涉及对hasNextnext的方法调用,而流可以通过单个方法调用来获取每个元素.因此,有时顺序流可以克服基于迭代器的循环(给定正确的每个元素工作负载,足够多的元素等).因此,即使流有一些设置成本,它也有可能以比常规for循环更快的速度运行.

For-loops over collections usually involve creating an iterator and thus have somewhat more overhead than array-based loops. In particular, each iteration on an iterator involves method calls to hasNext and next whereas a stream can get each element with a single method call. For this reason there are times a sequential stream can beat a iterator-based loop (given the right per-element workload, a sufficiently large number of elements, etc.). So even though there is some setup cost for a stream, there is also the possibility that it might end up running faster than a conventional for-loop.

最后,性能不是唯一的考虑因素.还具有可读性和可维护性.流和lambda内容最初可能是新的和陌生的,但是它具有简化和清理代码的巨大潜力.例如,请参见我对另一个问题的回答.

Finally, performance isn't the only consideration. There is also readability and maintainability. The streams and lambda stuff may initially be new and unfamiliar, but it has great potential to simplify and clean up code. See my answer to another question, for example.

这篇关于Java 8:第一次使用stream()或parallelStream()的速度很慢-实际使用中有意义吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆