如何使用数据透视生成单行矩阵? [英] How to use pivot to generate a single-row matrix?

查看:163
本文介绍了如何使用数据透视生成单行矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将以下两列数据框旋转为单行(长到宽).

I need to pivot the following two-column dataframe to one-row one (long to wide).

+--------+-----+
|   udate|   cc|
+--------+-----+
|20090622|  458|
|20090624|31068|
|20090626|  151|
|20090629|  148|
|20090914|  453|
+--------+-----+

我需要这种格式的

+--------+------------+----------+----------+
|   udate|   20090622 | 20090624 | 20090626 |
+--------+------------+----------+----------+
|     cc |         458|    31068 |      151 |etc

我跑了这个

result_df.groupBy($"udate").pivot("udate").agg(max($"cc")).show()

但最后得到的是所有行转换为所有列的矩阵:

but ended up with a matrix of all rows transposed to all columns:

+--------+--------+--------+--------+--------+--------+---
|   udate|20090622|20090624|20090626|20090629|20090703|200
+--------+--------+--------+--------+--------+--------+---
|20090622|     458|    null|    null|    null|    null|   
|20090624|    null|   31068|    null|    null|    null|   
|20090626|    null|    null|     151|    null|    null|   
|20090629|    null|    null|    null|     148|    null|   
|20090703|    null|    null|    null|    null|     362|   
|20090704|    null|    null|    null|    null|    null|   
|20090715|    null|    null|    null|    null|    null|   
|20090718|    null|    null|    null|    null|    null|   
|20090721|    null|    null|    null|    null|    null|   
|20090722|    null|    null|    null|    null|    null|

我希望透视一个一列的数据集会导致一个一列的透视数据集.

I expected that pivoting a one-column dataset should result in a one-row pivoted dataset.

如何修改透视命令,以便将结果集透视到一行?

How can I modify the pivot command so that the result set is pivoted to one row?

推荐答案

tl; dr 在Spark 2.4.0中,它简单地归结为仅使用groupBy.

tl;dr In Spark 2.4.0 it simply boils down to using groupBy alone.

val solution = d.groupBy().pivot("udate").agg(first("cc"))
scala> solution.show
+--------+--------+--------+--------+--------+
|20090622|20090624|20090626|20090629|20090914|
+--------+--------+--------+--------+--------+
|     458|   31068|     151|     148|     453|
+--------+--------+--------+--------+--------+

如果您真的需要第一列的名称,只需使用withColumn就可以了.

If you really need the first column with the names just use withColumn and you're done.

val betterSolution = solution.select(lit("cc") as "udate", $"*")
scala> betterSolution.show
+-----+--------+--------+--------+--------+--------+
|udate|20090622|20090624|20090626|20090629|20090914|
+-----+--------+--------+--------+--------+--------+
|   cc|     458|   31068|     151|     148|     453|
+-----+--------+--------+--------+--------+--------+

这篇关于如何使用数据透视生成单行矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆