使用分组,计数和过滤操作收集流 [英] Collect stream with grouping, counting and filtering operations

查看:127
本文介绍了使用分组,计数和过滤操作收集流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试收集丢弃很少使用的项目的流,例如:

I'm trying to collect stream throwing away rarely used items like in this example:

import java.util.*;
import java.util.function.Function;
import static java.util.stream.Collectors.*;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.containsInAnyOrder;
import org.junit.Test;

@Test
public void shouldFilterCommonlyUsedWords() {
    // given
    List<String> allWords = Arrays.asList(
       "call", "feel", "call", "very", "call", "very", "feel", "very", "any");

    // when
    Set<String> commonlyUsed = allWords.stream()
            .collect(groupingBy(Function.identity(), counting()))
            .entrySet().stream().filter(e -> e.getValue() > 2)
            .map(Map.Entry::getKey).collect(toSet());

    // then
    assertThat(commonlyUsed, containsInAnyOrder("call", "very"));
}

我觉得有可能做得更简单 - 我是对吗?

I have a feeling that it is possible to do it much simpler - am I right?

推荐答案

前一段时间我为我的库写了一个实验性的 distinct(atLeast)方法:

public StreamEx<T> distinct(long atLeast) {
    if (atLeast <= 1)
        return distinct();
    AtomicLong nullCount = new AtomicLong();
    ConcurrentHashMap<T, Long> map = new ConcurrentHashMap<>();
    return filter(t -> {
        if (t == null) {
            return nullCount.incrementAndGet() == atLeast;
        }
        return map.merge(t, 1L, (u, v) -> (u + v)) == atLeast;
    });
}

所以想法就是这样使用它:

So the idea was to use it like this:

Set<String> commonlyUsed = StreamEx.of(allWords).distinct(3).toSet();

这会执行状态过滤,看起来有点难看。我怀疑这个功能是否有用,因此我没有将它合并到主分支中。然而,它在单流传递中完成了工作。可能我应该重振它。同时,您可以将此代码复制到静态方法中并使用如下:

This performs a stateful filtration, which looks a little bit ugly. I doubted whether such feature is useful thus I did not merge it into the master branch. Nevertheless it does the job in single stream pass. Probably I should revive it. Meanwhile you can copy this code into the static method and use it like this:

Set<String> commonlyUsed = distinct(allWords.stream(), 3).collect(Collectors.toSet());

更新(2015/05/31):我添加了 distinct(atLeast) 方法到StreamEx 0.3.1。它是使用自定义分裂器<实现的/ A>。基准测试显示,对于顺序流,此实现比上述状态过滤快得多,并且在许多情况下,它也比本主题中提出的其他解决方案更快。如果在流中遇到 null ,它也能很好地工作( groupingBy 收藏家不支持 null 作为类,因此如果遇到 null groupingBy -based解决方案将失败。

Update (2015/05/31): I added the distinct(atLeast) method to the StreamEx 0.3.1. It's implemented using custom spliterator. Benchmarks showed that this implementation is significantly faster for sequential streams than stateful filtering described above and in many cases it's also faster than other solutions proposed in this topic. Also it works nicely if null is encountered in the stream (the groupingBy collector doesn't support null as class, thus groupingBy-based solutions will fail if null is encountered).

这篇关于使用分组,计数和过滤操作收集流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆