如何取消旋转大型火花数据框? [英] How to unpivot a large spark dataframe?
本文介绍了如何取消旋转大型火花数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
当列数相当少并且列的名称可以硬编码时,我已经看到了一些unpivot
火花数据帧的解决方案.您是否有可扩展的解决方案来对包含多列的数据框进行逆透视?
I have seen a few solutions to unpivot
a spark dataframe when the number of columns is reasonably low and that the columns' names can be hardcoded. Do you have a scalable solution to unpivot a dataframe with numerous columns?
下面是一个玩具问题.
输入:
val df = Seq(
(1,1,1,0),
(2,0,0,1)
).toDF("ID","A","B","C")
+---+--------+----+
| ID| A | B | C |
+---+--------+-----
| 1| 1 | 1 | 0 |
| 2| 0 | 0 | 1 |
+---+----------+--+
预期结果:
+---+-----+-----+
| ID|names|count|
+---+-----------|
| 1| A | 1 |
| 1| B | 1 |
| 1| C | 0 |
| 2| A | 0 |
| 2| B | 0 |
| 2| C | 1 |
+---+-----------+
该解决方案应该适用于具有 N 列的数据集进行逆透视,其中 N 很大(比如 100 列).
The solution should be applicable to datasets with N columns to unpivot, where N is large (say 100 columns).
推荐答案
这应该可行,我假设您知道要取消透视的列列表
This should work, I am assuming you know the list of columns that you want to unpivot on
import org.apache.spark.sql.{functions => func, _}
val df = Seq(
(1,1,1,0),
(2,0,0,1)
).toDF("ID","A","B","C")
val cols = Seq("A", "B", "C")
df.select(
$"ID",
func.explode(
func.array(
cols.map(
col =>
func.struct(
func.lit(col).alias("names"),
func.col(col).alias("count")
)
): _*
)
).alias("v")
)
.selectExpr("ID", "v.*")
这篇关于如何取消旋转大型火花数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文