Apache Spark 将多行连接成单行列表 [英] Apache Spark concatenate multiple rows into list in single row

查看:24
本文介绍了Apache Spark 将多行连接成单行列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从源表创建一个表(hive 表/spark 数据框),该表将多行用户的数据存储到单行列表中.

I need to create a table(hive table/spark dataframe) from a source table that stores data of users in multiple rows into list in single row.

User table:
Schema:  userid: string | transactiondate:string | charges: string |events:array<struct<name:string,value:string>> 
----|------------|-------| ---------------------------------------
123 | 2017-09-01 | 20.00 | [{"name":"chargeperiod","value":"this"}]
123 | 2017-09-01 | 30.00 | [{"name":"chargeperiod","value":"last"}]
123 | 2017-09-01 | 20.00 | [{"name":"chargeperiod","value":"recent"}]
123 | 2017-09-01 | 30.00 | [{"name":"chargeperiod","value":"0"}]
456 | 2017-09-01 | 20.00 | [{"name":"chargeperiod","value":"this"}]
456 | 2017-09-01 | 30.00 | [{"name":"chargeperiod","value":"last"}]
456 | 2017-09-01 | 20.00 | [{"name":"chargeperiod","value":"recent"}]
456 | 2017-09-01 | 30.00 | [{"name":"chargeperiod","value":"0"}]

输出表应该是

userid:String | concatenatedlist :List[Row]
-------|-----------------
123    | [[2017-09-01,20.00,[{"name":"chargeperiod","value":"this"}]],[2017-09-01,30.00,[{"name":"chargeperiod","value":"last"}]],[2017-09-01,20.00,[{"name":"chargeperiod","value":"recent"}]], [2017-09-01,30.00, [{"name":"chargeperiod","value":"0"}]]]
456    | [[2017-09-01,20.00,[{"name":"chargeperiod","value":"this"}]],[2017-09-01,30.00,[{"name":"chargeperiod","value":"last"}]],[2017-09-01,20.00,[{"name":"chargeperiod","value":"recent"}]], [2017-09-01,30.00, [{"name":"chargeperiod","value":"0"}]]]

Spark 版本:1.6.2

Spark version: 1.6.2

推荐答案

Seq(("1", "2017-02-01", "20.00", "abc"),
  ("1", "2017-02-01", "30.00", "abc2"),
  ("2", "2017-02-01", "20.00", "abc"),
  ("2", "2017-02-01", "30.00", "abc"))
.toDF("id", "date", "amt", "array")

df.withColumn("new", concat_ws(",", $"date", $"amt", $"array"))
  .select("id", "new")
  .groupBy("id")
  .agg(concat_ws(",", collect_list("new")))

这篇关于Apache Spark 将多行连接成单行列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆