如何在 PySpark DataFrame 中将列转置为行? [英] How to transpose Columns to Rows in a PySpark DataFrame?

查看:97
本文介绍了如何在 PySpark DataFrame 中将列转置为行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如我有这个 DF

+----+------+------+-------+
| id | val1 | val2 | val3  |
+----+------+------+-------+
|  1 |   a  |   b  |   c   |
|  2 |   d  |   e  |   f   |
|  3 |   g  |   h  |   i   |

并且我希望以这种方式将列 val1val2val3 转置为行:

and I want the Columns val1, val2 and val3 transposed into Rows in this way:

+----+------+------+
| id |  key | value|
+----+------+------+
|  1 | val1 |   a  |
|  1 | val2 |   b  |
|  1 | val3 |   c  |
|  2 | val1 |   d  |
|  2 | val2 |   e  |
|  2 | val3 |   f  |
|  3 | val1 |   g  |
|  3 | val2 |   h  |
|  3 | val3 |   i  |

我怎样才能做到这一点?

How can I achieve that?

推荐答案

通过分解 Map 结构,可以获得这样的转换.所以首先将 3 列转换为 Map(基于 https://stackoverflow.com/a/41291156/4725074)然后爆炸:

By exploding a Map structure it is possible to get such transformation. So first transform the 3 columns into a Map (based on https://stackoverflow.com/a/41291156/4725074) for then exploding:

from pyspark.sql.functions as F
from itertools import chain

m = F.create_map(list(chain(*(
        (F.lit(c), F.col(c)) for c in df.columns if c in ["val1", "val2", "val3"]))))

df.withColumn('map', m)\
  .select('*', F.explode('map')).drop('map')

使用 select() 而不是 witchColumn() 很重要,因为分解 Map 列将生成 2 个新列,key> 和 value.

It's important to use a select() and not a witchColumn() as when exploding a Map column will generate 2 new columns, key and value.

结果将是如示例中报告的那样将所选列转置为行.

The result will be the transposition of the selected columns into rows as reported in the example.

这篇关于如何在 PySpark DataFrame 中将列转置为行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆