如何在Spark 1.6的窗口聚合中使用collect_set和collect_list函数? [英] How to use collect_set and collect_list functions in windowed aggregation in Spark 1.6?
本文介绍了如何在Spark 1.6的窗口聚合中使用collect_set和collect_list函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在Spark 1.6.0/Scala中,是否有机会获得collect_list("colC")
或collect_set("colC").over(Window.partitionBy("colA").orderBy("colB")
?
In Spark 1.6.0 / Scala, is there an opportunity to get collect_list("colC")
or collect_set("colC").over(Window.partitionBy("colA").orderBy("colB")
?
推荐答案
鉴于您拥有dataframe
作为
+----+----+----+
|colA|colB|colC|
+----+----+----+
|1 |1 |23 |
|1 |2 |63 |
|1 |3 |31 |
|2 |1 |32 |
|2 |2 |56 |
+----+----+----+
您可以通过执行以下操作Window
功能
You can Window
functions by doing the following
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
df.withColumn("colD", collect_list("colC").over(Window.partitionBy("colA").orderBy("colB"))).show(false)
结果:
+----+----+----+------------+
|colA|colB|colC|colD |
+----+----+----+------------+
|1 |1 |23 |[23] |
|1 |2 |63 |[23, 63] |
|1 |3 |31 |[23, 63, 31]|
|2 |1 |32 |[32] |
|2 |2 |56 |[32, 56] |
+----+----+----+------------+
collect_set
的结果也与此类似.但是最后一个set
中的元素顺序不会像collect_list
Similar is the result for collect_set
as well. But the order of elements in the final set
will not be in order as with collect_list
df.withColumn("colD", collect_set("colC").over(Window.partitionBy("colA").orderBy("colB"))).show(false)
+----+----+----+------------+
|colA|colB|colC|colD |
+----+----+----+------------+
|1 |1 |23 |[23] |
|1 |2 |63 |[63, 23] |
|1 |3 |31 |[63, 31, 23]|
|2 |1 |32 |[32] |
|2 |2 |56 |[56, 32] |
+----+----+----+------------+
如果您按以下说明删除orderBy
If you remove orderBy
as below
df.withColumn("colD", collect_list("colC").over(Window.partitionBy("colA"))).show(false)
结果应为
+----+----+----+------------+
|colA|colB|colC|colD |
+----+----+----+------------+
|1 |1 |23 |[23, 63, 31]|
|1 |2 |63 |[23, 63, 31]|
|1 |3 |31 |[23, 63, 31]|
|2 |1 |32 |[32, 56] |
|2 |2 |56 |[32, 56] |
+----+----+----+------------+
我希望答案会有所帮助
这篇关于如何在Spark 1.6的窗口聚合中使用collect_set和collect_list函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文