明智地聚合数组元素 [英] Aggregating arrays element wise

查看:81
本文介绍了明智地聚合数组元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

火花/标量非常新。我想知道是否有一种简单的方法以列方式聚合Array [Double]。下面是一个示例:

Pretty new to spark/scala. I am wondering if there is an easy way to aggregate an Array[Double] in a column-wise fashion. Here is an example:

c1   c2   c3
-------------------------
1     1   [1.0, 1.0, 3.4]
1     2   [1.0, 0,0, 4.3]
2     1   [0.0, 0.0, 0.0]
2     3   [1.2, 1.1, 1.1]

然后,在聚合时,I会以类似以下的表格结尾:

Then, upon aggregation, I would end with a table that looks like:

c1   c3prime
-------------
1     [2.0, 1.0, 7.7]
2     [1.2, 1.1, 1.1]

现在正在查看UDAF,但想知道我是否需要编写代码吗?

Looking at UDAF now, but was wondering if I need to code at all?

感谢您的考虑。

推荐答案

假设 c3 的数组值大小相同,则可以按以下方式逐个对列求和:如下所示的UDF的方法:

Assuming the array values of c3 are of the same size, you can sum the column element-wise by means of a UDF like below:

val df = Seq(
  (1, 1, Seq(1.0, 1.0, 3.4)),
  (1, 2, Seq(1.0, 0.0, 4.3)),
  (2, 1, Seq(0.0, 0.0, 0.0)),
  (2, 3, Seq(1.2, 1.1, 1.1))
).toDF("c1", "c2", "c3")

def elementSum = udf(
  (a: Seq[Seq[Double]]) => {
    val zeroSeq = Seq.fill[Double](a(0).size)(0.0)
    a.foldLeft(zeroSeq)(
      (a, x) => (a zip x).map{ case (u, v) => u + v }
    )
  }
)

val df2 = df.groupBy("c1").agg(
  elementSum(collect_list("c3")).as("c3prime")
)

df2.show(truncate=false)
// +---+-----------------------------+
// |c1 |c3prime                      |
// +---+-----------------------------+
// |1  |[2.0, 1.0, 7.699999999999999]|
// |2  |[1.2, 1.1, 1.1]              |
// +---+-----------------------------+

这篇关于明智地聚合数组元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆