Java Stream API:为什么要区分顺序和并行执行模式? [英] Java Stream API: why the distinction between sequential and parallel execution mode?

查看:25
本文介绍了Java Stream API:为什么要区分顺序和并行执行模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自 Stream javadoc:

流管道可以顺序或并行执行.这种执行模式是流的一个属性.流是在初始选择顺序或并行执行时创建的.

我的假设:

  1. 顺序/并行流之间没有功能差异.输出永远不受执行模式的影响.
  2. 考虑到适当数量的内核和问题大小来证明开销的合理性,并行流总是更可取的.
  3. 我们希望一次编写代码并在任何地方运行,而不必关心硬件(毕竟这是 Java).

假设这些假设是有效的(有点元假设没有错),在 api 中公开执行模式有什么价值?

Assuming these assumptions are valid (nothing wrong with a bit of meta-assumption), what's the value in having the execution mode exposed in the api?

似乎您应该能够声明一个 Stream,并且顺序/并行执行的选择应该在下面的层中自动处理,要么由库代码要么由 JVM 本身作为运行时可用内核的功能、问题的大小等.

It seems like you should just be able to declare a Stream, and the choice of sequential/parallel execution should be handled automagically in a layer below, either by library code or the JVM itself as a function of the cores available at runtime, the size of the problem, etc.

当然,假设并行流也可以在单核机器上运行,也许只是总是使用并行流就可以实现这一点.但这真的很难看 - 当它是默认选项时,为什么在我的代码中显式引用并行流?

Sure, assuming parallel streams also work on a single core machine, perhaps just always using a parallel stream achieves this. But this is really ugly - why have explicit references to parallel streams in my code when it's the default option?

即使在某些情况下您会故意对顺序流的使用进行硬编码 - 为什么不只是一个子接口 SequentialStream 用于此目的,而不是污染 SequentialStreamcode>Stream 带执行模式切换?

Even if there is a scenario where you'd deliberately want to hard code the use of a sequential stream - why is there not just a sub-interface SequentialStream for that purpose, rather than polluting Stream with an execution mode switch?

推荐答案

似乎您应该能够声明一个流,并且顺序/并行执行的选择应该在下面的层中自动处理,通过库代码或 JVM 本身作为运行时可用内核的功能,问题的大小等

It seems like you should just be able to declare a Stream, and the choice of sequential/parallel execution should be handled automagically in a layer below, either by library code or the JVM itself as a function of the cores available at runtime, the size of the problem, etc.

现实情况是 a) 流是一个库,并且没有特殊的 JVM 魔法,并且 b) 你不能真正设计一个足够聪明的库来自动找出在这种特殊情况下正确的决定是什么.没有合理的方法来估计不运​​行特定函数的成本有多高——即使你可以内省它的实现,但你不能——现在你在每个流操作中引入一个基准,试图弄清楚如果并行化,那么并行开销的成本是值得的.这不切实际,尤其是考虑到您事先也不知道并行开销有多严重.

The reality is that a) streams are a library, and have no special JVM magic, and b) you can't really design a library smart enough to automagically figure out what the right decision is in this particular case. There's no sensible way to estimate how costly a particular function will be without running it -- even if you could introspect its implementation, which you can't -- and now you're introducing a benchmark into every stream operation, trying to figure out if parallelizing it will be worth the cost of the parallelism overhead. That's just not practical, especially given that you don't know in advance how bad the parallelism overhead is, either.

考虑到适当数量的内核和问题大小来证明开销的合理性,并行流总是更可取的.

A parallel stream is always preferable, given appropriate number of cores and problem size to justify the overhead, due to the performance gains.

在实践中并非总是如此.有些任务太小以至于它们不值得并行化,并且并行化总是有一些开销.(坦率地说,大多数程序员往往高估了并行性的用处,当它真的伤害性能时,到处都拍打着它.)

Not always, in practice. Some tasks are just so small that they're not worth parallelizing, and parallelism does always have some overhead. (And frankly, most programmers tend to overestimate the usefulness of parallelism, slapping it everywhere when it's really hurting performance.)

基本上,这是一个足够难的问题,你基本上必须把它推给程序员.

Basically, it's a hard enough problem that you basically have to shove it off onto the programmer.

这篇关于Java Stream API:为什么要区分顺序和并行执行模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆