使用分组,计数和过滤操作收集流 [英] Collect stream with grouping, counting and filtering operations
问题描述
我正在尝试收集丢弃很少使用的项目的流,例如:
I'm trying to collect stream throwing away rarely used items like in this example:
import java.util.*;
import java.util.function.Function;
import static java.util.stream.Collectors.*;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.containsInAnyOrder;
import org.junit.Test;
@Test
public void shouldFilterCommonlyUsedWords() {
// given
List<String> allWords = Arrays.asList(
"call", "feel", "call", "very", "call", "very", "feel", "very", "any");
// when
Set<String> commonlyUsed = allWords.stream()
.collect(groupingBy(Function.identity(), counting()))
.entrySet().stream().filter(e -> e.getValue() > 2)
.map(Map.Entry::getKey).collect(toSet());
// then
assertThat(commonlyUsed, containsInAnyOrder("call", "very"));
}
我觉得有可能做得更简单 - 我是对吗?
I have a feeling that it is possible to do it much simpler - am I right?
推荐答案
前一段时间我为我的库写了一个实验性的 distinct(atLeast)
方法:
public StreamEx<T> distinct(long atLeast) {
if (atLeast <= 1)
return distinct();
AtomicLong nullCount = new AtomicLong();
ConcurrentHashMap<T, Long> map = new ConcurrentHashMap<>();
return filter(t -> {
if (t == null) {
return nullCount.incrementAndGet() == atLeast;
}
return map.merge(t, 1L, (u, v) -> (u + v)) == atLeast;
});
}
所以想法就是这样使用它:
So the idea was to use it like this:
Set<String> commonlyUsed = StreamEx.of(allWords).distinct(3).toSet();
这会执行状态过滤,看起来有点难看。我怀疑这个功能是否有用,因此我没有将它合并到主分支中。然而,它在单流传递中完成了工作。可能我应该重振它。同时,您可以将此代码复制到静态方法中并使用如下:
This performs a stateful filtration, which looks a little bit ugly. I doubted whether such feature is useful thus I did not merge it into the master branch. Nevertheless it does the job in single stream pass. Probably I should revive it. Meanwhile you can copy this code into the static method and use it like this:
Set<String> commonlyUsed = distinct(allWords.stream(), 3).collect(Collectors.toSet());
更新(2015/05/31):我添加了 distinct(atLeast)
方法到StreamEx 0.3.1。它是使用自定义分裂器<实现的/ A>。基准测试显示,对于顺序流,此实现比上述状态过滤快得多,并且在许多情况下,它也比本主题中提出的其他解决方案更快。如果在流中遇到 null
,它也能很好地工作( groupingBy
收藏家不支持 null
作为类,因此如果遇到 null
, groupingBy
-based解决方案将失败。
Update (2015/05/31): I added the distinct(atLeast)
method to the StreamEx 0.3.1. It's implemented using custom spliterator. Benchmarks showed that this implementation is significantly faster for sequential streams than stateful filtering described above and in many cases it's also faster than other solutions proposed in this topic. Also it works nicely if null
is encountered in the stream (the groupingBy
collector doesn't support null
as class, thus groupingBy
-based solutions will fail if null
is encountered).
这篇关于使用分组,计数和过滤操作收集流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!