通过 apache spark 将行作为列表与组一起收集 [英] Collect rows as list with group by apache spark

查看：23 发布时间：2021/11/14 22:19:33 java scala apache-spark apache-spark-sql spark-streaming

本文介绍了通过 apache spark 将行作为列表与组一起收集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个特定的用例，我为同一客户有多个行，其中每个行对象看起来像:

I have a particular use case where I have multiple rows for same customer where each row object looks like:

root
 -c1: BigInt
 -c2: String
 -c3: Double
 -c4: Double
 -c5: Map[String, Int]

现在我已经按列 c1 进行分组并将所有行收集为同一客户的列表，例如:

Now I have do group by column c1 and collect all the rows as list for same customer like:

c1, [Row1, Row3, Row4]
c2, [Row2, Row5]

我试过这样做dataset.withColumn("combined", array("c1","c2","c3","c4","c5")).groupBy("c1").agg(collect_list("combined")) 但我得到一个例外:

I tried doing this ways dataset.withColumn("combined", array("c1","c2","c3","c4","c5")).groupBy("c1").agg(collect_list("combined")) but I get an exception:

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'array(`c1`, `c2`, `c3`, `c4`, `c5`)' due to data type mismatch: input to function array should all be the same type, but it's [bigint, string, double, double, map<string,map<string,double>>];;

推荐答案

您可以使用 struct 函数来组合列并使用 groupBy<而不是 array/code> 和 collect_list 聚合函数为


Instead of array you can use struct function to combine the columns and use groupBy and collect_list aggregation function as
import org.apache.spark.sql.functions._
df.withColumn("combined", struct("c1","c2","c3","c4","c5"))
    .groupBy("c1").agg(collect_list("combined").as("combined_list"))
    .show(false)

以便您将分组数据集与 schema 作为 
root
 |-- c1: integer (nullable = false)
 |-- combined_list: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- c1: integer (nullable = false)
 |    |    |-- c2: string (nullable = true)
 |    |    |-- c3: string (nullable = true)
 |    |    |-- c4: string (nullable = true)
 |    |    |-- c5: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: integer (valueContainsNull = false)

希望回答对你有帮助

                        这篇关于通过 apache spark 将行作为列表与组一起收集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

通过 apache spark 将行作为列表与组一起收集 [英] Collect rows as list with group by apache spark

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

通过 apache spark 将行作为列表与组一起收集 [英] Collect rows as list with group by apache spark

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭