如何在 Spark 1.6 的窗口聚合中使用 collect_set 和 collect_list 函数? [英] How to use collect_set and collect_list functions in windowed aggregation in Spark 1.6?
本文介绍了如何在 Spark 1.6 的窗口聚合中使用 collect_set 和 collect_list 函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在 Spark 1.6.0/Scala 中,是否有机会获得 collect_list("colC")
或 collect_set("colC").over(Window.partitionBy("colA")).orderBy("colB")
?
In Spark 1.6.0 / Scala, is there an opportunity to get collect_list("colC")
or collect_set("colC").over(Window.partitionBy("colA").orderBy("colB")
?
推荐答案
假设你有 dataframe
as
+----+----+----+
|colA|colB|colC|
+----+----+----+
|1 |1 |23 |
|1 |2 |63 |
|1 |3 |31 |
|2 |1 |32 |
|2 |2 |56 |
+----+----+----+
您可以通过执行以下操作来Window
功能
You can Window
functions by doing the following
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
df.withColumn("colD", collect_list("colC").over(Window.partitionBy("colA").orderBy("colB"))).show(false)
结果:
+----+----+----+------------+
|colA|colB|colC|colD |
+----+----+----+------------+
|1 |1 |23 |[23] |
|1 |2 |63 |[23, 63] |
|1 |3 |31 |[23, 63, 31]|
|2 |1 |32 |[32] |
|2 |2 |56 |[32, 56] |
+----+----+----+------------+
collect_set
的结果也类似.但是最终 set
中元素的顺序不会像 collect_list
Similar is the result for collect_set
as well. But the order of elements in the final set
will not be in order as with collect_list
df.withColumn("colD", collect_set("colC").over(Window.partitionBy("colA").orderBy("colB"))).show(false)
+----+----+----+------------+
|colA|colB|colC|colD |
+----+----+----+------------+
|1 |1 |23 |[23] |
|1 |2 |63 |[63, 23] |
|1 |3 |31 |[63, 31, 23]|
|2 |1 |32 |[32] |
|2 |2 |56 |[56, 32] |
+----+----+----+------------+
如果删除 orderBy
如下
df.withColumn("colD", collect_list("colC").over(Window.partitionBy("colA"))).show(false)
结果是
+----+----+----+------------+
|colA|colB|colC|colD |
+----+----+----+------------+
|1 |1 |23 |[23, 63, 31]|
|1 |2 |63 |[23, 63, 31]|
|1 |3 |31 |[23, 63, 31]|
|2 |1 |32 |[32, 56] |
|2 |2 |56 |[32, 56] |
+----+----+----+------------+
希望回答对你有帮助
这篇关于如何在 Spark 1.6 的窗口聚合中使用 collect_set 和 collect_list 函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文