Spark数据框:基于列的数据透视和分组 [英] Spark dataframe: Pivot and Group based on columns

查看：86 发布时间：2020/9/4 4:32:17 scala hadoop apache-spark spark-dataframe

本文介绍了Spark数据框:基于列的数据透视和分组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有如下输入数据框，其中包含ID，应用和客户

I have input dataframe as below with id, app, and customer

输入数据框

+--------------------+-----+---------+
|                  id|app  |customer |
+--------------------+-----+---------+
|id1                 |   fw|     WM  |
|id1                 |   fw|     CS  |
|id2                 |   fw|     CS  |
|id1                 |   fe|     WM  |
|id3                 |   bc|     TR  |
|id3                 |   bc|     WM  |
+--------------------+-----+---------+

预期输出

使用数据透视和汇总-将应用程序值作为列名称，并将汇总的客户名称作为列表放置在数据框中

Using pivot and aggregate - make app values as column name and put aggregated customer names as list in the dataframe

预期数据框

+--------------------+----------+-------+----------+
|                  id|   bc     |     fe|    fw    |
+--------------------+----------+-------+----------+
|id1                 |  0       |     WM|   [WM,CS]|
|id2                 |  0       |      0|   [CS]   |
|id3                 | [TR,WM]  |      0|      0   |
+--------------------+----------+-------+----------+

我尝试了什么?

val newDF = df.groupBy("id").pivot("app").agg(expr("coalesce(first(customer)，0)")).drop("app").show()

val newDF = df.groupBy("id").pivot("app").agg(expr("coalesce(first(customer),0)")).drop("app").show()

+--------------------+-----+-------+------+
|                  id|bc   |     fe|    fw|
+--------------------+-----+-------+------+
|id1                 |  0  |     WM|    WM|
|id2                 |  0  |      0|    CS|
|id3                 | TR  |      0|     0|
+--------------------+-----+-------+------+

问题:在我的查询中，我无法在"fw"下获得"id1"之类的[WM，CS]客户列表(如预期输出所示)，仅"WM"即将到来.同样，对于"id3"，仅会出现"TR"，而应该在"id3"的"bc"下出现一个值为[TR，WM]的列表

Issue : In my query , i am not able to get the list of customer like [WM,CS] for "id1" under "fw" (as shown in expected output) , only "WM" is coming. Similarly, for "id3" only "TR" is appearing - instead a list should appear with value [TR,WM] under "bc" for "id3"

需要您的建议以分别获取每个应用程序下的客户列表.

Need your suggestion to get the list of customer under each app respectively.

Spark数据框:基于列的数据透视和分组 [英] Spark dataframe: Pivot and Group based on columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark数据框:基于列的数据透视和分组 [英] Spark dataframe: Pivot and Group based on columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭