明智地聚合数组元素 [英] Aggregating arrays element wise
本文介绍了明智地聚合数组元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
火花/标量非常新。我想知道是否有一种简单的方法以列方式聚合Array [Double]。下面是一个示例:
Pretty new to spark/scala. I am wondering if there is an easy way to aggregate an Array[Double] in a column-wise fashion. Here is an example:
c1 c2 c3
-------------------------
1 1 [1.0, 1.0, 3.4]
1 2 [1.0, 0,0, 4.3]
2 1 [0.0, 0.0, 0.0]
2 3 [1.2, 1.1, 1.1]
然后,在聚合时,I会以类似以下的表格结尾:
Then, upon aggregation, I would end with a table that looks like:
c1 c3prime
-------------
1 [2.0, 1.0, 7.7]
2 [1.2, 1.1, 1.1]
现在正在查看UDAF,但想知道我是否需要编写代码吗?
Looking at UDAF now, but was wondering if I need to code at all?
感谢您的考虑。
推荐答案
假设 c3
的数组值大小相同,则可以按以下方式逐个对列求和:如下所示的UDF的方法:
Assuming the array values of c3
are of the same size, you can sum the column element-wise by means of a UDF like below:
val df = Seq(
(1, 1, Seq(1.0, 1.0, 3.4)),
(1, 2, Seq(1.0, 0.0, 4.3)),
(2, 1, Seq(0.0, 0.0, 0.0)),
(2, 3, Seq(1.2, 1.1, 1.1))
).toDF("c1", "c2", "c3")
def elementSum = udf(
(a: Seq[Seq[Double]]) => {
val zeroSeq = Seq.fill[Double](a(0).size)(0.0)
a.foldLeft(zeroSeq)(
(a, x) => (a zip x).map{ case (u, v) => u + v }
)
}
)
val df2 = df.groupBy("c1").agg(
elementSum(collect_list("c3")).as("c3prime")
)
df2.show(truncate=false)
// +---+-----------------------------+
// |c1 |c3prime |
// +---+-----------------------------+
// |1 |[2.0, 1.0, 7.699999999999999]|
// |2 |[1.2, 1.1, 1.1] |
// +---+-----------------------------+
这篇关于明智地聚合数组元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文