是否有可能在java 8中执行一个惰性groupby,返回一个流? [英] Is it possible to do a lazy groupby, returning a stream, in java 8?

查看:91
本文介绍了是否有可能在java 8中执行一个惰性groupby,返回一个流?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要通过对其行进行分组来处理一些大文本文件。

I have some large-ish text files that I want to process by grouping its lines.

我尝试使用新的流媒体功能,例如

I tried to use the new streaming features, like

return FileUtils.readLines(...) 
            .parallelStream()
            .map(...)
            .collect(groupingBy(pair -> pair[0]));

问题是,AFAIK,这会产生一张地图。

The problem is that, AFAIK, this generates a Map.

有没有办法像上面那样生成高级代码,例如,条目流?

Is there any way to have high level code like the one above that generates, for example, a Stream of Entries?

UPDATE :我正在寻找的是像python的 itertools.groupby 。我的文件已经排序(通过pair [0]),我只想逐个加载组。

UPDATE: What I'm looking for is something like python's itertools.groupby. My files are already sorted (by pair[0]), I just want to load the groups one by one.

我已经有了一个迭代解决方案。我只是想知道是否有更多的声明方式来做到这一点。顺便说一句,使用番石榴或其他第三方库不会是一个大问题。

I already have an iterative solution. I'm just wondering if there's a more declarative way to do that. Btw, using guava or another 3rd party library wouldn't be a big problem.

推荐答案

你想要实现的任务是相当的与分组不同。 groupingBy 不依赖于 Stream 的元素的顺序,而是依赖于 Map 应用于分类器的算法功能的结果。

The task you want to achieve is quite different from what grouping does. groupingBy does not rely on the order of the Stream’s elements but on the Map’s algorithm applied to the classifier Function’s result.

你想要的是什么将具有公共属性值的相邻项折叠到一个列表项中。只要您可以保证所有具有相同属性值的项目都是聚集的,就不需要按该属性对 Stream 进行排序。

What you want is to fold adjacent items having a common property value into one List item. It is not even necessary to have the Stream sorted by that property as long as you can guaranty that all items having the same property value are clustered.

也许有可能将此任务表示为减少,但对我来说结果看起来太复杂了。

Maybe it is possible to formulate this task as a reduction but to me the resulting structure looks too complicated.

所以,除非直接支持这个功能被添加到 Stream 中,基于迭代器的方法对我来说看起来最实用:

So, unless direct support for this feature gets added to the Streams, an iterator based approach looks most pragmatic to me:

class Folding<T,G> implements Spliterator<Map.Entry<G,List<T>>> {
    static <T,G> Stream<Map.Entry<G,List<T>>> foldBy(
            Stream<? extends T> s, Function<? super T, ? extends G> f) {
        return StreamSupport.stream(new Folding<>(s.spliterator(), f), false);
    }
    private final Spliterator<? extends T> source;
    private final Function<? super T, ? extends G> pf;
    private final Consumer<T> c=this::addItem;
    private List<T> pending, result;
    private G pendingGroup, resultGroup;

    Folding(Spliterator<? extends T> s, Function<? super T, ? extends G> f) {
        source=s;
        pf=f;
    }
    private void addItem(T item) {
        G group=pf.apply(item);
        if(pending==null) pending=new ArrayList<>();
        else if(!pending.isEmpty()) {
            if(!Objects.equals(group, pendingGroup)) {
                if(pending.size()==1)
                    result=Collections.singletonList(pending.remove(0));
                else {
                    result=pending;
                    pending=new ArrayList<>();
                }
                resultGroup=pendingGroup;
            }
        }
        pendingGroup=group;
        pending.add(item);
    }
    public boolean tryAdvance(Consumer<? super Map.Entry<G, List<T>>> action) {
        while(source.tryAdvance(c)) {
            if(result!=null) {
                action.accept(entry(resultGroup, result));
                result=null;
                return true;
            }
        }
        if(pending!=null) {
            action.accept(entry(pendingGroup, pending));
            pending=null;
            return true;
        }
        return false;
    }
    private Map.Entry<G,List<T>> entry(G g, List<T> l) {
        return new AbstractMap.SimpleImmutableEntry<>(g, l);
    }
    public int characteristics() { return 0; }
    public long estimateSize() { return Long.MAX_VALUE; }
    public Spliterator<Map.Entry<G, List<T>>> trySplit() { return null; }
}

生成的折叠 Stream <的惰性/ code>最好通过将其应用于无限流来证明:

The lazy nature of the resulting folded Stream can be best demonstrated by applying it to an infinite stream:

Folding.foldBy(Stream.iterate(0, i->i+1), i->i>>4)
       .filter(e -> e.getKey()>5)
       .findFirst().ifPresent(e -> System.out.println(e.getValue()));

这篇关于是否有可能在java 8中执行一个惰性groupby,返回一个流?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆