如何使用数据透视生成单行矩阵? [英] How to use pivot to generate a single-row matrix?
本文介绍了如何使用数据透视生成单行矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要将以下两列数据框旋转为单行(长到宽).
I need to pivot the following two-column dataframe to one-row one (long to wide).
+--------+-----+
| udate| cc|
+--------+-----+
|20090622| 458|
|20090624|31068|
|20090626| 151|
|20090629| 148|
|20090914| 453|
+--------+-----+
我需要这种格式的
+--------+------------+----------+----------+
| udate| 20090622 | 20090624 | 20090626 |
+--------+------------+----------+----------+
| cc | 458| 31068 | 151 |etc
我跑了这个
result_df.groupBy($"udate").pivot("udate").agg(max($"cc")).show()
但最后得到的是所有行转换为所有列的矩阵:
but ended up with a matrix of all rows transposed to all columns:
+--------+--------+--------+--------+--------+--------+---
| udate|20090622|20090624|20090626|20090629|20090703|200
+--------+--------+--------+--------+--------+--------+---
|20090622| 458| null| null| null| null|
|20090624| null| 31068| null| null| null|
|20090626| null| null| 151| null| null|
|20090629| null| null| null| 148| null|
|20090703| null| null| null| null| 362|
|20090704| null| null| null| null| null|
|20090715| null| null| null| null| null|
|20090718| null| null| null| null| null|
|20090721| null| null| null| null| null|
|20090722| null| null| null| null| null|
我希望透视一个一列的数据集会导致一个一列的透视数据集.
I expected that pivoting a one-column dataset should result in a one-row pivoted dataset.
如何修改透视命令,以便将结果集透视到一行?
How can I modify the pivot command so that the result set is pivoted to one row?
推荐答案
tl; dr 在Spark 2.4.0中,它简单地归结为仅使用groupBy
.
tl;dr In Spark 2.4.0 it simply boils down to using groupBy
alone.
val solution = d.groupBy().pivot("udate").agg(first("cc"))
scala> solution.show
+--------+--------+--------+--------+--------+
|20090622|20090624|20090626|20090629|20090914|
+--------+--------+--------+--------+--------+
| 458| 31068| 151| 148| 453|
+--------+--------+--------+--------+--------+
如果您真的需要第一列的名称,只需使用withColumn
就可以了.
If you really need the first column with the names just use withColumn
and you're done.
val betterSolution = solution.select(lit("cc") as "udate", $"*")
scala> betterSolution.show
+-----+--------+--------+--------+--------+--------+
|udate|20090622|20090624|20090626|20090629|20090914|
+-----+--------+--------+--------+--------+--------+
| cc| 458| 31068| 151| 148| 453|
+-----+--------+--------+--------+--------+--------+
这篇关于如何使用数据透视生成单行矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文