在 spark-sql/pyspark 中取消旋转 [英] Unpivot in spark-sql/pyspark

查看:25
本文介绍了在 spark-sql/pyspark 中取消旋转的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我手头有一个问题陈述,我想在 spark-sql/pyspark 中取消透视表.我已经阅读了文档,我可以看到目前仅支持枢轴,但不支持非枢轴.有什么办法可以做到这一点吗?

I have a problem statement at hand wherein I want to unpivot table in spark-sql/pyspark. I have gone through the documentation and I could see there is support only for pivot but no support for un-pivot so far. Is there a way I can achieve this?

让我的初始表格看起来像这样:

Let my initial table look like this:

当我使用下面提到的命令在 pyspark 中旋转它时:

when I pivot this in pyspark using below mentioned command:

df.groupBy("A").pivot("B").sum("C")

我得到这个作为输出:

现在我想取消旋转透视表.通常,此操作可能会/可能不会根据我旋转原始表的方式产生原始表.

Now I want to unpivot the pivoted table. In general this operation may/may not yield the original table based on how I've pivoted the original table.

截至目前,Spark-sql 不提供对 unpivot 的开箱即用支持.有什么办法可以做到这一点吗?

Spark-sql as of now doesn't provide out of the box support for unpivot. Is there a way I can achieve this?

推荐答案

您可以使用内置的堆栈函数,例如在 Scala 中:

You can use the built in stack function, for example in Scala:

scala> val df = Seq(("G",Some(4),2,None),("H",None,4,Some(5))).toDF("A","X","Y", "Z")
df: org.apache.spark.sql.DataFrame = [A: string, X: int ... 2 more fields]

scala> df.show
+---+----+---+----+
|  A|   X|  Y|   Z|
+---+----+---+----+
|  G|   4|  2|null|
|  H|null|  4|   5|
+---+----+---+----+


scala> df.select($"A", expr("stack(3, 'X', X, 'Y', Y, 'Z', Z) as (B, C)")).where("C is not null").show
+---+---+---+
|  A|  B|  C|
+---+---+---+
|  G|  X|  4|
|  G|  Y|  2|
|  H|  Y|  4|
|  H|  Z|  5|
+---+---+---+

或者在 pyspark 中:

Or in pyspark:

In [1]: df = spark.createDataFrame([("G",4,2,None),("H",None,4,5)],list("AXYZ"))

In [2]: df.show()
+---+----+---+----+
|  A|   X|  Y|   Z|
+---+----+---+----+
|  G|   4|  2|null|
|  H|null|  4|   5|
+---+----+---+----+

In [3]: df.selectExpr("A", "stack(3, 'X', X, 'Y', Y, 'Z', Z) as (B, C)").where("C is not null").show()
+---+---+---+
|  A|  B|  C|
+---+---+---+
|  G|  X|  4|
|  G|  Y|  2|
|  H|  Y|  4|
|  H|  Z|  5|
+---+---+---+

这篇关于在 spark-sql/pyspark 中取消旋转的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆