Iterator与Java的流8 [英] Iterator versus Stream of Java 8

查看:178
本文介绍了Iterator与Java的流8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了利用Jdk 8的 java.util.stream 中包含的各种查询方法,我尝试设计域模型,其中getter与<$的关系c $ c> * multiplicity(零个或多个实例)返回 Stream< T> ,而不是 Iterable< ; T> Iterator< T>

To take advantage of the wide range of query methods included in java.util.stream of Jdk 8 I am attempted to design domain models where getters of relationship with * multiplicity (with zero or more instances ) return a Stream<T>, instead of an Iterable<T> or Iterator<T>.

我怀疑是否有与 Iterator< T>

那么,使用 Stream< T> 来折衷我的域模型是否有任何不利之处?

So, is there any disadvantage of compromising my domain model with a Stream<T>?

或者,我应该总是返回 Iterator< T> Iterable< T> ,并通过使用 StreamUtils 转换迭代器,给最终用户选择是否使用流的决定?

Or instead, should I always return an Iterator<T> or Iterable<T>, and leave to the end-user the decision of choosing whether to use a stream, or not, by converting that iterator with the StreamUtils?

注意返回集合不是av alid选项,因为在这种情况下,大多数关系是懒惰的,并且大小未知。

Note that returning a Collection is not a valid option because in this case most of the relationships are lazy and with unknown size.

推荐答案

这里有很多性能建议,但是遗憾的是,大部分都是猜测,很少有人指出真正的性能考虑因素。

There's lots of performance advice here, but sadly much of it is guesswork, and little of it points to the real performance considerations.

@Holger 做对了,指出我们应该抵制看似压倒性的倾向让性能尾巴摇摆API设计犬。

@Holger gets it right by pointing out that we should resist the seemingly overwhelming tendency to let the performance tail wag the API design dog.

虽然有很多考虑因素可以使流比任何特定情况下的某些其他形式的遍历更慢,相同或更快,但有一些因素对于流量而言,这一点在数据集上具有重要的性能优势。

While there are a zillion considerations that can make a stream slower than, the same as, or faster than some other form of traversal in any given case, there are some factors that point to streams haven a performance advantage where it counts -- on big data sets.

与创建创建 Stream 相比,还有一些额外的固定启动开销 Iterator - 在开始计算之前还有一些对象。如果您的数据集很大,则无关紧要;这是一个小的启动成本,通过大量的计算摊销。 (如果你的数据集很小,那么它可能也没关系 - 因为如果你的程序在小型数据集上运行,性能通常也不是你的第一个问题。)这个做什么事情是平行的;设置管道的任何时间都进入了Amdahl定律的连续部分;如果你看看实现,我们努力在流设置期间保持对象倒数,但我很乐意找到减少它的方法,因为这会对并行开始赢得的盈亏平衡数据集大小产生直接影响顺序。

There is some additional fixed startup overhead of creating a Stream compared to creating an Iterator -- a few more objects before you start calculating. If your data set is large, it doesn't matter; it's a small startup cost amortized over a lot of computation. (And if your data set is small, it probably also doesn't matter -- because if your program is operating on small data sets, performance is generally not your #1 concern either.) Where this does matter is when going parallel; any time spent setting up the pipeline goes into the serial fraction of Amdahl's law; if you look at the implementation, we work hard to keep the object count down during stream setup, but I'd be happy to find ways to reduce it as that has a direct effect on the breakeven data set size where parallel starts to win over sequential.

但是,比固定启动成本更重要的是每元素访问成本。在这里,流实际上赢了 - 并且经常赢得大奖 - 有些人可能会感到惊讶。 (在我们的性能测试中,我们通常会看到流管道,它们可以超过 Collection 对应的for循环。)并且,有一个简单的解释: Spliterator 基本上比 Iterator 更低的每元素访问成本,甚至是顺序的。有几个原因。

But, more important than the fixed startup cost is the per-element access cost. Here, streams actually win -- and often win big -- which some may find surprising. (In our performance tests, we routinely see stream pipelines which can outperform their for-loop over Collection counterparts.) And, there's a simple explanation for this: Spliterator has fundamentally lower per-element access costs than Iterator, even sequentially. There are several reasons for this.


  1. Iterator协议从根本上说效率较低。它需要调用两个方法来获取每个元素。此外,因为迭代器必须是健壮的,比如调用 next()而不需要 hasNext() hasNext()多次没有 next(),这两种方法通常都需要做一些防御性编码(通常更有状态和分支) ),这会增加效率低下。另一方面,即使是遍历分裂器的慢速方式( tryAdvance )也没有这种负担。 (对于并发数据结构来说更糟糕,因为 next / hasNext 对偶性基本上是有效的,并且 Iterator 实现必须做更多工作来防御并发修改,而不是 Spliterator 实现。)

  1. The Iterator protocol is fundamentally less efficient. It requires calling two methods to get each element. Further, because Iterators must be robust to things like calling next() without hasNext(), or hasNext() multiple times without next(), both of these methods generally have to do some defensive coding (and generally more statefulness and branching), which adds to inefficiency. On the other hand, even the slow way to traverse a spliterator (tryAdvance) doesn't have this burden. (It's even worse for concurrent data structures, because the next/hasNext duality is fundamentally racy, and Iterator implementations have to do more work to defend against concurrent modifications than do Spliterator implementations.)

Spliterator 进一步提供快速路径迭代 - forEachRemaining - 其中大多数时候都可以使用(reduction,forEach),进一步减少了调解数据结构内部访问的迭代代码的开销。这也倾向于内联,这反过来又增加了其他优化的有效性,例如代码运动,边界检查消除等。

Spliterator further offers a "fast-path" iteration -- forEachRemaining -- which can be used most of the time (reduction, forEach), further reducing the overhead of the iteration code that mediates access to the data structure internals. This also tends to inline very well, which in turn increases the effectiveness of other optimizations such as code motion, bounds check elimination, etc.

此外,遍历过程 Spliterator 的堆写入次数往往少于 Iterator 。使用 Iterator ,每个元素都会导致一个或多个堆写入(除非 Iterator 可以通过转义分析及其字段进行标量化在其他问题中,这导致GC卡标记活动,导致卡标记的缓存行争用。另一方面, Spliterators 往往具有较少的状态,而工业强度 forEachRemaining 实现往往推迟写入任何内容堆直到遍历结束,而是将其迭代状态存储在自然映射到寄存器的本地中,从而减少内存总线活动。

Further, traversal via Spliterator tend to have many fewer heap writes than with Iterator. With Iterator, every element causes one or more heap writes (unless the Iterator can be scalarized via escape analysis and its fields hoisted into registers.) Among other issues, this causes GC card mark activity, leading to cache line contention for the card marks. On the other hand, Spliterators tend to have less state, and industrial-strength forEachRemaining implementations tend to defer writing anything to the heap until the end of the traversal, instead storing its iteration state in locals which naturally map to registers, resulting in reduced memory bus activity.

总结:别担心,快乐。 Spliterator 是一个更好的 Iterator ,即使没有并行性。 (他们通常也更容易写,更容易出错。)

Summary: don't worry, be happy. Spliterator is a better Iterator, even without parallelism. (They're also generally just easier to write and harder to get wrong.)

这篇关于Iterator与Java的流8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆