Java Stream API:为什么区分顺序和并行执行模式? [英] Java Stream API: why the distinction between sequential and parallel execution mode?

查看:111
本文介绍了Java Stream API:为什么区分顺序和并行执行模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自 Stream javadoc


流管道可以顺序执行,也可以并行执行。此执行模式是流的属性。创建流时最初选择顺序执行或并行执行。

我的假设:


  1. 顺序/并行流之间没有功能差异。输出永远不会受执行模式的影响。

  2. 由于性能提升,给定适当数量的内核和问题大小以确定开销,并行流总是更可取。

  3. 我们想编写一次代码并在任何地方运行而不必关心硬件(毕竟这是Java)。

假设这些假设是有效的(一些元假设没有错误),在api中暴露执行模式的价值是什么?

Assuming these assumptions are valid (nothing wrong with a bit of meta-assumption), what's the value in having the execution mode exposed in the api?

看起来您应该能够声明 Stream ,顺序/并行执行的选择应该在下面的层中自动处理,要么库代码或JVM本身作为运行时可用内核的函数,问题的大小等等。

It seems like you should just be able to declare a Stream, and the choice of sequential/parallel execution should be handled automagically in a layer below, either by library code or the JVM itself as a function of the cores available at runtime, the size of the problem, etc.

当然,假设并行流也可以在单个上运行核心机器,或许只是总是使用并行流来实现这一目标。但这真的很难看 - 为什么我的代码中的并行流显式引用它是默认选项?

Sure, assuming parallel streams also work on a single core machine, perhaps just always using a parallel stream achieves this. But this is really ugly - why have explicit references to parallel streams in my code when it's the default option?

即使有一个你故意想要的场景硬代码使用顺序流 - 为什么不只是为了这个目的而没有子接口 SequentialStream ,而不是污染 Stream 带执行模式开关?

Even if there is a scenario where you'd deliberately want to hard code the use of a sequential stream - why is there not just a sub-interface SequentialStream for that purpose, rather than polluting Stream with an execution mode switch?

推荐答案


看起来你应该能够声明一个Stream,并且顺序/并行执行的选择应该在下面的层中自动处理,可以是库代码,也可以是JVM本身,作为运行时可用内核的函数,问题的大小等等。

It seems like you should just be able to declare a Stream, and the choice of sequential/parallel execution should be handled automagically in a layer below, either by library code or the JVM itself as a function of the cores available at runtime, the size of the problem, etc.

现实情况是,a)流是一个库,没有特殊的JVM魔法,而b)你无法真正设计一个智能库足以自动确定在这种特殊情况下正确的决定。没有明智的方法来估计一个特定功能在没有运行的情况下会花费多少 - 即使你可以反省它的实现,你也不能 - 现在你要在每个流操作中引入一个基准测试,试图弄清楚如果并行化将是值得并行开销的成本。这是不切实际的,特别是考虑到你事先并不知道并行开销有多糟糕。

The reality is that a) streams are a library, and have no special JVM magic, and b) you can't really design a library smart enough to automagically figure out what the right decision is in this particular case. There's no sensible way to estimate how costly a particular function will be without running it -- even if you could introspect its implementation, which you can't -- and now you're introducing a benchmark into every stream operation, trying to figure out if parallelizing it will be worth the cost of the parallelism overhead. That's just not practical, especially given that you don't know in advance how bad the parallelism overhead is, either.


并行流是由于性能的提高,给定适当数量的内核和问题大小以证明开销是合适的。

A parallel stream is always preferable, given appropriate number of cores and problem size to justify the overhead, due to the performance gains.

在实践中并非总是如此。有些任务非常小,以至于它们不值得并行化,并行性总是会产生一些开销。 (坦率地说,大多数程序员倾向于高估并行性的有用性,当它真正伤害性能时,将它打到各处。)

Not always, in practice. Some tasks are just so small that they're not worth parallelizing, and parallelism does always have some overhead. (And frankly, most programmers tend to overestimate the usefulness of parallelism, slapping it everywhere when it's really hurting performance.)

基本上,它是一个你基本上不得不把它推到程序员身上就足够了。

Basically, it's a hard enough problem that you basically have to shove it off onto the programmer.

这篇关于Java Stream API:为什么区分顺序和并行执行模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆