Java 8流处理不流畅 [英] Java 8 stream processing not fluent

查看:107
本文介绍了Java 8流处理不流畅的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了Java 8流的问题,其中数据是在突然批量处理而不是在请求时处理的。我有一个相当复杂的流 - 必须并行化,因为我使用 concat 来合并两个流。

I have a problem with Java 8 streams, where the data is processed in sudden bulks, rather than when they are requested. I have a rather complex stream-flow which has to be parallelised because I use concat to merge two streams.

我的问题源于这样一个事实:数据似乎在大量分钟内解析 - 有时甚至是数小时。我希望只要 Stream 读取传入数据,就可以进行此处理,以分散工作负载。批量处理几乎在所有方面都是违反直觉的。

My issue stems from the fact that data seems to be parsed in large bulks minutes - and sometimes even hours - apart. I would expect this processing to happen as soon as the Stream reads incoming data, to spread the workload. Bulk processing seems counterintuitive in almost every way.

所以,问题是为什么会发生这种批量收集以及如何避免它。

So, the question is why this bulk-collection occurs and how I can avoid it.

我的输入是一个未知大小的Spliterator,我使用forEach作为终端操作。

My input is a Spliterator of unknown size and I use a forEach as the terminal operation.

推荐答案

并行流的基本原则是遇到顺序不必匹配处理顺序。如果需要,这使得能够在组装正确排序的结果的同时处理子列表或子树的项目。这显然允许批量处理,甚至使得有序流的并行处理成为必需。

It’s a fundamental principle of parallel streams that the encounter order doesn’t have to match the processing order. This enables concurrent processing of items of sublists or subtrees while assembling a correctly ordered result, if necessary. This explicitly allows bulk processing and even makes it mandatory for the parallel processing of ordered streams.

此行为由特定的实现决定 Spliterator trySplit 实施。 规范说:

This behavior is determined by the particular implementation of the Spliterator’s trySplit implementation. The specification says:


如果此Spliterator是 ORDERED ,返回的Spliterator必须覆盖元素的严格前缀

If this Spliterator is ORDERED, the returned Spliterator must cover a strict prefix of the elements

...

API注:

一个理想的 trySplit 方法有效地(没有遍历)将其元素精确地分成两半,允许平衡的并行计算。

An ideal trySplit method efficiently (without traversal) divides its elements exactly in half, allowing balanced parallel computation.






为什么这个策略在规范中修复而不是,例如偶数/奇数分裂?


Why was this strategy fixed in the specification and not, e.g. an even/odd split?

好吧,考虑一个简单的用例。列表将被过滤并收集到新列表中,因此必须保留遭遇顺序。使用前缀规则,它很容易实现。拆分前缀,同时过滤两个块,然后将前缀过滤的结果添加到新列表,然后添加过滤后缀。

Well, consider a simple use case. A list will be filtered and collected into a new list, thus the encounter order must be retained. With the prefix rule, it’s rather easy to implement. Split off a prefix, filter both chunks concurrently, afterwards, add the result of the prefix filtering to the new list, followed by adding the filtered suffix.

偶数奇数策略,这是不可能的。您可以同时过滤这两个部分,但之后,除非您在整个操作过程中跟踪每个项目的位置,否则您不知道如何正确加入结果。

With an even odd strategy, that’s impossible. You may filter both parts concurrently, but afterwards, you don’t know how to join the results correctly unless you track each items position throughout the entire operation.

即便如此,加入这些齿轮产品会比每个块执行 addAll 复杂得多。

Even then, joining these geared items would be much more complicated than performing an addAll per chunk.

如果您有可能需要保留的遭遇订单,您可能已经注意到这一切都只适用。如果您的分词器没有报告 ORDERED 特征,则不需要返回前缀。尽管如此,您可能已继承 AbstractSpliterator 旨在与有序分裂器兼容。因此,如果你想要一个不同的策略,你必须自己实现拆分操作。

You might have noticed that this all applies only, if you have an encounter order that might have to be retained. If your spliterator doesn’t report an ORDERED characteristic, it is not required to return a prefix. Nevertheless, the default implementation you might have inherited by AbstractSpliterator is designed to be compatible with ordered spliterators. Thus, if you want a different strategy, you have to implement the split operation yourself.

或者你使用不同的方式来实现无序流,例如

Or you use a different way of implementing an unordered stream, e.g.

Stream.generate(()->{
    LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(1));
    return Thread.currentThread().getName();
}).parallel().forEach(System.out::println);

可能更接近您的预期。

这篇关于Java 8流处理不流畅的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆