Java8:如何从流聚合对象? [英] Java8 : how to aggregate objects from a stream?

查看:232
本文介绍了Java8:如何从流聚合对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

恕我直言:我认为这不是重复,因为这两个问题试图以不同方式解决问题,特别是因为它们完全提供不同的技术技能(最后,因为我问自己这两个问题)。

如何从有序流中聚合项目,最好是在中间操作中?

How to aggregate items from an ordered stream, preferably in an intermediate operation ?

关注我的其他问题: Java8流线和终端线上的操作汇总

我有一个非常大的表格文件:

I've got a very large file of the form :

MASTER_REF1
    SUBREF1
    SUBREF2
    SUBREF3
MASTER_REF2
MASTER_REF3
    SUBREF1
    ...

SUBREF(如果有的话)适用于MASTER_REF且两者都是复杂的对象cts(你可以想象它有点喜欢 JSON)。

Where SUBREF (if any) is applicable to MASTER_REF and both are complex objects (you can imagine it somewhat like JSON).

首先看我尝试用行返回<$ c $对行进行分组c> null 在聚集时和一组行可以找到一个值(如果 line.charAt(0)!='')。

On first look I tried to group the lines with an operation returning null while agregating and a value when a group of line could be found (a "group" of lines ends if line.charAt(0)!=' ').

此代码难以阅读,需要 .filter(Objects :: nonNull)

This code is hard to read and requires a .filter(Objects::nonNull).

我认为可以使用 .collect(groupingBy(...))或a .reduce(...)但这些是终端操作:

I think one could achieve this using a .collect(groupingBy(...)) or a .reduce(...) but those are terminal operations which is :


  • 在我的情况下不是必需的:行是按顺序排列的,应该按照它们的位置进行分组,然后对行组进行转换(map + filter + ... + foreach);

  • 也不是好主意:我说的是一个大于RAM + SWAP总量的巨大数据文件...终端操作会使可用资源饱和(如上所述,设计我需要将组保留在内存中,因为之后被转换)

推荐答案

正如我在回答上一个问题,可以使用一些提供部分缩减操作的第三方库。其中一个库是我自己开发的 StreamEx

As I already noted in the answer to the previous question, it's possible to use some third-party libraries which provide partial reduction operations. One of such libraries is StreamEx which I develop by myself.

在StreamEx库中,部分缩减操作是中间流操作,它结合了几个输入元素,同时满足某些条件。通常情况是通过 BiPredicate 指定应用于一对相邻的流元素,当元素组合在一起时返回 true 。组合元素的最简单方法是通过 List StreamEx.html#groupRuns-java.util.function.BiPredicate-rel =nofollow noreferrer> StreamEx.groupRuns() 这样的方法:

In StreamEx library the partial reduction operation is the intermediate stream operation which combines several input elements while some condition is met. Usually the condition is specified via BiPredicate applied to the pair of adjacent stream elements which returns true when elements should be combined together. The simplest way to combine elements is to make a List via StreamEx.groupRuns() method like this:

Stream<List<String>> records = StreamEx.of(Files.lines(path))
    .groupRuns((line1, line2) -> !line2.startsWith("MASTER"));

这里我们开始一条新记录,当两条相邻行中的第二行以开头时MASTER(如您的示例所示)。否则我们继续前一条记录。

Here we start a new record when the second of two adjacent lines starts with "MASTER" (as in your example). Otherwise we continue the previous record.

请注意,此类流仍然是惰性的。在顺序处理中,一次最多创建一个中间 List< String> 。虽然将 Files.lines 流转换为并行模式很少提高性能(至少在Java-9之前),但也支持并行处理。

Note that such stream is still lazy. In sequential processing at most one intermediate List<String> is created at a time. Parallel processing is also supported, though turning the Files.lines stream into parallel mode rarely improves the performance (at least prior to Java-9).

这篇关于Java8:如何从流聚合对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆