在一次通过中执行多次减少 [英] Performing more than one reduction in a single pass

查看:136
本文介绍了在一次通过中执行多次减少的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在单次传递中执行多次减少的习惯用法是什么?是否只有一个大的reducer类,即使这违反了SRP,如果需要多种类型的简化计算?

What is the idiom for performing more than one reduction in a single pass of a stream? Is it just to have one big reducer class, even if this violates SRP if more than one type of reduction computation is required?

推荐答案

据推测,您希望避免多次通过,因为管道阶段可能很昂贵。或者您希望避免收集中间值以便通过多个收集器运行它们,因为存储所有值的成本可能太高。

Presumably you want to avoid making multiple passes, as the pipeline stages might be expensive. Or you want to avoid collecting the intermediate values in order to run them through multiple collectors, since the cost of storing all the values might be too high.

As Brian Goetz指出收藏家.summarizingInt 将收集 int 值并对它们执行多次减少,返回一个名为的汇总结构IntSummaryStatistics 。有类似的收集器用于汇总 double long 值。

As Brian Goetz noted, Collectors.summarizingInt will collect int values and perform multiple reductions on them, returning an aggregate structure called IntSummaryStatistics. There are similar collectors for summarizing double and long values.

不幸的是,这些只会执行一组固定的缩减,所以如果你想做与他们不同的缩减,你必须编写自己的收藏家。

Unfortunately these perform only a fixed set of reductions, so if you want to do reductions different from what they do, you have to write your own collector.

这是一个在一次通过中使用多个不相关的收集器的技术。我们可以使用 peek()对流经流中的每个值进行破解,将其传递给不受干扰。 peek()操作需要一个 Consumer ,所以我们需要一种方法来调整收集器消费者 Consumer 将是收集器的累加器函数。但我们还需要调用收集器的供应商函数并存储它创建的对象以传递给 accumulator 函数。我们需要一种方法将结果从收集器中取出。为此,我们将收集器包装在一个小帮助器类中:

Here's a technique for using multiple, unrelated collectors in a single pass. We can use peek() to take a crack at every value going through the stream, passing it through undisturbed. The peek() operation takes a Consumer, so we need a way to adapt a Collector to a Consumer. The Consumer will be the Collector's accumulator function. But we also need to call the Collector's supplier function and store the object it creates for passing to the accumulator function. And we need a way to get the result out of the Collector. To do this, we'll wrap the Collector in a little helper class:

public class PeekingCollector<T,A,R> {
    final Collector<T,A,R> collector;
    final A acc;

    public PeekingCollector(Collector<T,A,R> collector) {
        this.collector = collector;
        this.acc = collector.supplier().get();
    }

    public Consumer<T> peek() {
        if (collector.characteristics().contains(Collector.Characteristics.CONCURRENT))
            return t -> collector.accumulator().accept(acc, t);
        else
            return t -> {
                synchronized (this) {
                    collector.accumulator().accept(acc, t);
                }
            };
    }

    public synchronized R get() {
        return collector.finisher().apply(acc);
    }
}

要使用此功能,我们首先必须创建包装收藏家并挂在上面。然后我们运行管道并调用 peek ,传递包装的收集器。最后,我们在包装的收集器上调用 get 来获取结果。这是一个简单的例子,用于过滤和排序某些单词,同时也按首字母对它们进行分组:

To use this, we first have to create the wrapped collector and hang onto it. Then we run the pipeline and call peek, passing the wrapped collector. Finally we call get on the wrapped collector to get its result. Here's a simple example that filters and sorts some words, while also grouping them by first letter:

    List<String> input = Arrays.asList(
        "aardvark", "crocodile", "antelope",
        "buffalo", "bustard", "cockatoo",
        "capybara", "bison", "alligator");

    PeekingCollector<String,?,Map<String,List<String>>> grouper =
        new PeekingCollector<>(groupingBy(s -> s.substring(0, 1)));

    List<String> output = input.stream()
                               .filter(s -> s.length() > 5)
                               .peek(grouper.peek())
                               .sorted()
                               .collect(toList());

    Map<String,List<String>> groups = grouper.get();
    System.out.println(output);
    System.out.println(groups);

输出为:

[aardvark, alligator, antelope, buffalo, bustard, capybara, cockatoo, crocodile]
{a=[aardvark, antelope, alligator], b=[buffalo, bustard], c=[crocodile, cockatoo, capybara]}

这有点麻烦,因为你必须写出通用类型被包裹的收藏家(这有点不寻常;他们经常被推断出来)。但是如果处理或存储流值的费用足够大,也许值得麻烦。

It's a bit cumbersome, as you have to write out the generic types for the wrapped collector (which is a bit unusual; they're often all inferred). But if the expense of processing or storing stream values is great enough, perhaps it's worth the trouble.

最后注意 peek()。因此,非线程安全的收集器必须受 synchronized 块的保护。如果收集器是线程安全的,我们不需要在调用它时进行同步。为了确定这一点,我们检查收集器的 CONCURRENT 特征。如果你运行并行流,最好在< peek 操作,否则包装的收集器内的同步可能会导致瓶颈并减慢整个流的速度。

Finally note that peek() can be called from multiple threads if the stream is run in parallel. For this reason non-thread-safe collectors must be protected by a synchronized block. If the collector is thread-safe, we needn't synchronize around calling it. To determine this, we check the collector's CONCURRENT characteristic. If you run a parallel stream, it's preferable to place a concurrent collector (such as groupingByConcurrent or toConcurrentMap) within the peek operation, otherwise the synchronization within the wrapped collector may cause a bottleneck and slow down the entire stream.

这篇关于在一次通过中执行多次减少的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆