汇总scala函数的说明 [英] Explanation of the aggregate scala function

查看:113
本文介绍了汇总scala函数的说明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我还不了解聚合函数:

例如,具有:

val x = List(1,2,3,4,5,6)
val y = x.par.aggregate((0, 0))((x, y) => (x._1 + y, x._2 + 1), (x,y) => (x._1 + y._1, x._2 + y._2))

结果将是:(21,6)

好吧,我认为(x,y)=> (x._1 + y._1,x._2 + y._2)是要并行获取结果,例如它将是(1 + 2,1 + 1),依此类推。

Well, I think that (x,y) => (x._1 + y._1, x._2 + y._2) is to get the result in parallel, for example it will be (1 + 2, 1 + 1) and so on.

但这部分让我感到困惑:

But exactly this part that leaves me confused:

(x, y) => (x._1 + y, x._2 + 1)

为什么 x ._1 + y ?而这里 x._2 0

谢谢

推荐答案

来自文档

def aggregate[B](z: ⇒ B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B




汇总将运算符应用于后续元素的结果。

Aggregates the results of applying an operator to subsequent elements.

更一般的折叠和减少形式。它具有类似
的语义,但是不要求结果成为
元素类型的超类型。它依次遍历不同分区
中的元素,使用seqop更新结果,然后将
组合应用于来自不同分区的结果。
的实现可以在任意数量的集合
分区上执行,因此可以任意调用组合操作。

This is a more general form of fold and reduce. It has similar semantics, but does not require the result to be a supertype of the element type. It traverses the elements in different partitions sequentially, using seqop to update the result, and then applies combop to results from different partitions. The implementation of this operation may operate on an arbitrary number of collection partitions, so combop may be invoked an arbitrary number of times.

例如,可能要处理一些元素,然后生成
a Set。在这种情况下,seqop将处理一个元素并将其附加到列表
,而combop将来自不同
分区的两个列表连接在一起。初始值z将是一个空集。

For example, one might want to process some elements and then produce a Set. In this case, seqop would process an element and append it to the list, while combop would concatenate two lists from different partitions together. The initial value z would be an empty set.

pc.aggregate(Set [Int]())(_ + = process(_) ,_ ++ _)

另一个示例是
从双打集合计算几何平均值(一个
通常需要为此加倍)。 B累积
结果的类型z
分区累积结果的初始值-这通常是seqop
运算符的中性元素(例如,Nil表示列表连接,0表示求和),并且可能会对
进行多次评估seqop一个运算符,该运算符用于在分区组合中累积
结果;一个关联运算符,用于
合并来自不同分区的结果

Another example is calculating geometric mean from a collection of doubles (one would typically require big doubles for this). B the type of accumulated results z the initial value for the accumulated result of the partition - this will typically be the neutral element for the seqop operator (e.g. Nil for list concatenation or 0 for summation) and may be evaluated more than once seqop an operator used to accumulate results within a partition combop an associative operator used to combine results from different partitions

在您的示例中, B Tuple2 [Int,Int] 。然后,方法 seqop 从列表中获取单个元素,范围为 y ,并更新总计 B (x._1 + y,x._2 + 1)。因此,它增加了元组中的第二个元素。这样可以有效地将元素的总和放入元组的第一个元素,并将元素的数目放入元组的第二个元素。

In your example B is a Tuple2[Int, Int]. The method seqop then takes a single element from the list, scoped as y, and updates the aggregate B to (x._1 + y, x._2 + 1). So it increments the second element in the tuple. This effectively puts the sum of elements into the first element of the tuple and the number of elements into the second element of the tuple.

方法然后,combop 从每个并行执行线程获取结果并将其组合。加法组合提供的结果与按顺序在列表上运行的结果相同。

The method combop then takes the results from each parallel execution thread and combines them. Combination by addition provides the same results as if it were run on the list sequentially.

使用 B 作为元组这很可能令人困惑。您可以将问题分解为两个子问题,以更好地了解其工作方式。 res0 是结果元组中的第一个元素,而 res1 是结果元组中的第二个元素。

Using B as a tuple is likely the confusing piece of this. You can break the problem down into two sub problems to get a better idea of what this is doing. res0 is the first element in the result tuple, and res1 is the second element in the result tuple.

// Sums all elements in parallel.
scala> x.par.aggregate(0)((x, y) => x + y, (x, y) => x + y)
res0: Int = 21

// Counts all elements in parallel.    
scala> x.par.aggregate(0)((x, y) => x + 1, (x, y) => x + y)
res1: Int = 6

这篇关于汇总scala函数的说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆