Scala聚合函数示例 [英] Example of the Scala aggregate function

查看:112
本文介绍了Scala聚合函数示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找,但是找不到我可以理解的Scala中的 aggregate 函数的示例或讨论。似乎很强大。

I have been looking and I cannot find an example or discussion of the aggregate function in Scala that I can understand. It seems pretty powerful.

此函数可以用于减少元组的值以生成多图类型的集合吗?例如:

Can this function be used to reduce the values of tuples to make a multimap-type collection? For example:

val list = Seq(("one", "i"), ("two", "2"), ("two", "ii"), ("one", "1"), ("four", "iv"))

应用总计后:

Seq(("one" -> Seq("i","1")), ("two" -> Seq("2", "ii")), ("four" -> Seq("iv"))

也可以举参数 z segop combop 吗?我不清楚这些参数的作用。

Also, can you give example of parameters z, segop, and combop? I'm unclear on what these parameters do.

推荐答案

aggregate函数不能做到这一点(除了它是一个非常通用的函数,可以用来实现此目的)。您想要 groupBy 。至少接近。从 Seq [(String,String)] 开始,然后通过取元组中的第一项(即是(String,String)=> String),它将返回 Map [String,Seq [(String,String)] )。然后,您必须舍弃Seq [String,String)]值中的第一个参数。

The aggregate function does not do that (except that it is a very general function, and it could be used to do that). You want groupBy. Close to at least. As you start with a Seq[(String, String)], and you group by taking the first item in the tuple (which is (String, String) => String), it would return a Map[String, Seq[(String, String)]). You then have to discard the first parameter in the Seq[String, String)] values.

所以

list.groupBy(_._1).mapValues(_.map(_._2))

您将获得地图[String,Seq [(String,String)] 。如果要使用 Seq 而不是 Map ,请调用 toSeq 结果。我认为您无法保证生成的Seq中的订单是否正确

There you get a Map[String, Seq[(String, String)]. If you want a Seq instead of Map, call toSeq on the result. I don't think you have a guarantee on the order in the resulting Seq though

聚合是一个比较困难的功能。

Aggregate is a more difficult function.

首先考虑reduceLeft和reduceRight。
假设 as 是一个非空序列 as = Seq(a1,... an)类型 A f:(A,A)=> A 是将 A 类型的两个元素组合在一起的一种方法。我将其作为二进制运算符 @ a1 @ a2 而不是 f(a1 ,a2) as.reduceLeft(@)将计算((((a1 @ a2)@ a3)... @ an) reduceRight 将反括号放在(a1 @(a2 @ ... @ an)))) 。如果 @ 恰好是关联的,则不必在意括号。可以将其计算为(a1 @ ... @ ap)@(ap + 1 @ ... @ an)(在2个大括号内也有括号) ,但我们不必担心)。然后可以并行执行两个部分,而reduceLeft或reduceRight中的嵌套包围将强制执行完全顺序的计算。但是,只有在已知 @ 是关联的,而reduceLeft方法不知道这一点时,并行计算才可能。

Consider first reduceLeft and reduceRight. Let as be a non empty sequence as = Seq(a1, ... an) of elements of type A, and f: (A,A) => A be some way to combine two elements of type A into one. I will note it as a binary operator @, a1 @ a2 rather than f(a1, a2). as.reduceLeft(@) will compute (((a1 @ a2) @ a3)... @ an). reduceRight will put the parentheses the other way, (a1 @ (a2 @... @ an)))). If @ happens to be associative, one does not care about the parentheses. One could compute it as (a1 @... @ ap) @ (ap+1 @...@an) (there would be parantheses inside the 2 big parantheses too, but let's not care about that). Then one could do the two parts in parallel, while the nested bracketing in reduceLeft or reduceRight force a fully sequential computation. But parallel computation is only possible when @ is known to be associative, and the reduceLeft method cannot know that.

仍然可以使用 reduce 方法,其调用者将负责确保操作是关联的。然后 reduce 会根据需要对调用进行排序,可能会并行进行。确实,有这样一种方法。

Still, there could be method reduce, whose caller would be responsible for ensuring that the operation is associative. Then reduce would order the calls as it sees fit, possibly doing them in parallel. Indeed, there is such a method.

然而,各种reduce方法存在局限性。 Seq的元素只能组合为相同类型的结果: @ 必须为(A,A)=> A 。但是将它们组合成 B 可能会有一个更普遍的问题。一个以类型为 B 的值 b 开头,并将其与序列中的每个元素组合。运算符 @ (B,A)=> B ,然后计算((((b @ a1)@ a2)... @ an) foldLeft 可以做到这一点。 foldRight 做同样的事情,但以 an 开头。在那里, @ 操作没有机会进行关联。当一个人写 b @ a1 @ a2 时,它必须表示(b @ a1)@ a2 ,如(a1 @ a2)是错误的。因此foldLeft和foldRight必须是顺序的。

There is a limitation with the various reduce methods however. The elements of the Seq can only be combined to a result of the same type: @ has to be (A,A) => A. But one could have the more general problem of combining them into a B. One starts with a value b of type B, and combine it with every elements of the sequence. The operator @ is (B,A) => B, and one computes (((b @ a1) @ a2) ... @ an). foldLeft does that. foldRight does the same thing but starting with an. There, the @ operation has no chance to be associative. When one writes b @ a1 @ a2, it must mean (b @ a1) @ a2, as (a1 @ a2) would be ill-typed. So foldLeft and foldRight have to be sequential.

但是,假设每个 A 都可以变成 B ,让我们用 a!类型为 B 。此外,假设存在 + 操作(B,B)=> B ,而 @ 就是 b @ a 实际上是 b + a!。与其将元素与@组合,不如将它们全部通过转换为B,然后将它们与 + 组合在一起。那将是 as.map(!)。reduceLeft(+)。如果 + 是关联的,则可以使用reduce而不是顺序进行:as.map(!)。reduce(+)。可能有一个假设方法as.associativeFold(b,!,+)。

Suppose however, that each A can be turned into a B, let's write it with !, a! is of type B. Suppose moreover that there is a + operation (B,B) => B, and that @ is such that b @ a is in fact b + a!. Rather than combining elements with @, one could first transform all of them to B with !, then combine them with +. That would be as.map(!).reduceLeft(+). And if + is associative, then that can be done with reduce, and not be sequential: as.map(!).reduce(+). There could be an hypothetical method as.associativeFold(b, !, +).

聚合非常接近这一点。但是,可能有比 b + a!更有效的方法来实现 b @ a ! ,如果类型 B List [A] ,并且b @ a是a :: b,则 a!将为 a :: Nil ,而 b1 + b2 b2 ::: b1 。 a :: b比(a :: Nil)::: b更好。要受益于关联性,但仍使用 @ ,首先拆分 b + a1! + ... + an!,放入(b + a1!+ ap!)+(ap + 1!+ .. + an!),然后返回使用 @ (b @ a1 @ an)+(ap + 1!@ @ an)。一个还需要!在ap + 1上,因为一个必须以b开始。而且+仍然是必需的,出现在括号之间。为此,可以将 as.associativeFold(!, +)更改为 as.optimizedAssociativeFold(b,!,@,+)

Aggregate is very close to that. It may be however, that there is a more efficient way to implement b@a than b+a! For instance, if type B is List[A], and b@a is a::b, then a! will be a::Nil, and b1 + b2 will be b2 ::: b1. a::b is way better than (a::Nil):::b. To benefit from associativity, but still use @, one first splits b + a1! + ... + an!, into (b + a1! + ap!) + (ap+1! + ..+ an!), then go back to using @ with (b @ a1 @ an) + (ap+1! @ @ an). One still needs the ! on ap+1, because one must start with some b. And the + is still necessary too, appearing between the parantheses. To do that, as.associativeFold(!, +) could be changed to as.optimizedAssociativeFold(b, !, @, +).

返回 + + 是关联的,或者等效地,(B,+)是一个半群。实际上,编程中使用的大多数半群也恰好是类半群,即它们在B中包含一个中性元素 z (对于 zero ),因此对于每个 b z + b = b + z = b 。在这种情况下,有意义的操作很可能是 a! = z @ a 。此外,由于z是中性元素 b @ a1 .. @ an =(b + z)@ a1 @ an ,即 b +(z + a1 @ an)。因此总是可以从z开始聚合。如果要使用 b ,则在最后执行 b +结果。有了所有这些假设,我们就可以进行 s.aggregate(z,@,+)。这就是汇总所做的。 @ seqop 参数(在序列 z中应用@ a1 @ a2 @ ap ),而 + combop (已部分应用于组合结果,如(z + a1 @ ... @ ap)+(z + ap + 1 @ ... @ an) )。

Back to +. + is associative, or equivalently, (B, +) is a semigroup. In practice, most of the semigroups used in programming happen to be monoids too, i.e they contain a neutral element z (for zero) in B, so that for each b, z + b = b + z = b. In that case, the ! operation that make sense is likely to be be a! = z @ a. Moreover, as z is a neutral element b @ a1 ..@ an = (b + z) @ a1 @ an which is b + (z + a1 @ an). So is is always possible to start the aggregation with z. If b is wanted instead, you do b + result at the end. With all those hypotheses, we can do as.aggregate(z, @, +). That is what aggregate does. @ is the seqop argument (applied in a sequence z @ a1 @ a2 @ ap), and + is combop (applied to already partially combined results, as in (z + a1@...@ap) + (z + ap+1@...@an)).

总而言之, as.aggregate(z)(seqop,combop) as.foldLeft(z)(seqop)


  • (B,combop,z)是一个类动物

  • seqop(b,a)= combop(b,seqop(z, a))

  • (B, combop, z) is a monoid
  • seqop(b,a) = combop(b, seqop(z,a))

聚合实现可能会使用combop的关联性按自己的意愿对计算进行分组(不是但是,交换元素时,+不必是可交换的,:::不是)。它可以并行运行它们。

aggregate implementation may use the associativity of combop to group the computations as it likes (not swapping elements however, + has not to be commutative, ::: is not). It may run them in parallel.

最后,使用 aggregate 解决最初的问题留给读者练习。提示:使用 foldLeft 实施,然后找到 z combo 将满足上述条件。

Finally, solving the initial problem using aggregate is left as an exercise to the reader. A hint: implement using foldLeft, then find z and combo that will satisfy the conditions stated above.

这篇关于Scala聚合函数示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆