如何设计可能使用skip的返回流 [英] How to design a returned stream that may use skip

查看:248
本文介绍了如何设计可能使用skip的返回流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经创建了一个接受提供的输入并返回记录流的解析库。然后一个程序调用该库并处理结果。在我的情况下,我的程序正在使用类似

I have created a parsing library that accepts a provided input and returns a stream of Records. A program then calls this library and processes the results. In my case, my program is using something like

recordStream.forEach(r -> insertIntoDB(r));

可以提供给解析库的输入类型之一是一个平面文件,有一个标题行。因此,解析库可以配置为跳过标题行。如果配置了一个标题行,它会向返回值添加一个skip(n)元素。

One of the types of input that can be provided to the parsing library is a flat file, which may have a header row. As such, the parsing library can be configured to skip a header row. If a header row is configured, it adds a skip(n) element to the return, e.g.

Files.lines(input)**.skip(1)**.parallel().map(r -> createRecord(r));  

解析库返回生成的流。

但是,似乎跳过,并行和forEach不能很好地一起玩最终的程序员必须调用forEachOrdered,但是设计不好将这个要求放在程序员身上,期望他们知道他们必须使用forEachOrdered如果处理输入类型的文件与标题行。

But, it seems that skip, parallel and forEach do not play nicely togetherThe end programmer must instead invoke forEachOrdered, but it is poor design to put this requirement on the programmer, to expect them to know they must use forEachOrdered if dealing with an input type of a file with a header row.

如何在必要时强制执行有序的需求,在返回的流链的构建中,返回一个完整的功能流到程序编写者,而不是具有隐藏限制的流?是否将流包装在另一个流中?

How can I enforce the ordered requirement myself when necessary, within the construction of the returned stream chain, to return a fully functional stream to the program writer, instead of a stream with hidden limitations? Is the answer to wrap the stream in another stream?

推荐答案

forEachOrdered 不必因为 skip(),而是因为您的Stream并行。即使流是并行的,流将跳过第一个元素,如文档中所示:

forEachOrdered is necessary not because of the skip(), but because your Stream is parallel. Even if the stream is parallel, the stream will skip the first element, as indicated in the documentation:


虽然skip()通常是顺序流管线上的廉价操作,但是对于有序的并行流水线,特别是对于大值n,这可能是相当昂贵的,因为skip(n)被限制为跳过不只是任何n个元素,而是第一个n

While skip() is generally a cheap operation on sequential stream pipelines, it can be quite expensive on ordered parallel pipelines, especially for large values of n, since skip(n) is constrained to skip not just any n elements, but the first n elements in the encounter order.

明确记载, forEach 不必然尊重秩序。不要使用 forEachOrdered ,当您关心订单时,只是滥用Stream API:

It's clearly documented that forEach doesn't necessarily respect the order. Not using forEachOrdered when you care about the order is just a misuse of the Stream API:


这个操作的行为显然是不确定的。对于并行流管道,此操作不能保证遵守流的遇到顺序,因为这样做将牺牲并行性的好处。

The behavior of this operation is explicitly nondeterministic. For parallel stream pipelines, this operation does not guarantee to respect the encounter order of the stream, as doing so would sacrifice the benefit of parallelism.

我不会从库中返回并行流。我会返回一个顺序的(其中forEach会尊重订单),并让调用者调用 parallel()并假设后果如果要。

I would not return a parallel stream from the library. I would return a sequential one (where forEach would respect the order), and let the caller call parallel() and assume the consequences if it wants to.

默认使用并行流是一个坏主意

Using a parallel stream by default is a bad idea.

这篇关于如何设计可能使用skip的返回流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆