重命名PySpark Dataframe中的透视和聚合列 [英] Rename pivoted and aggregated column in PySpark Dataframe

查看：198 发布时间：2020/9/4 1:50:06 python apache-spark pyspark apache-spark-sql

本文介绍了重命名PySpark Dataframe中的透视和聚合列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

具有如下数据框:

from pyspark.sql.functions import avg, first

rdd = sc.parallelize(
    [
        (0, "A", 223,"201603", "PORT"), 
        (0, "A", 22,"201602", "PORT"), 
        (0, "A", 422,"201601", "DOCK"), 
        (1,"B", 3213,"201602", "DOCK"), 
        (1,"B", 3213,"201601", "PORT"), 
        (2,"C", 2321,"201601", "DOCK")
    ]
)
df_data = sqlContext.createDataFrame(rdd, ["id","type", "cost", "date", "ship"])

df_data.show()

我为此做一个重点，

df_data.groupby(df_data.id, df_data.type).pivot("date").agg(avg("cost"), first("ship")).show()

+---+----+----------------+--------------------+----------------+--------------------+----------------+--------------------+
| id|type|201601_avg(cost)|201601_first(ship)()|201602_avg(cost)|201602_first(ship)()|201603_avg(cost)|201603_first(ship)()|
+---+----+----------------+--------------------+----------------+--------------------+----------------+--------------------+
|  2|   C|          2321.0|                DOCK|            null|                null|            null|                null|
|  0|   A|           422.0|                DOCK|            22.0|                PORT|           223.0|                PORT|
|  1|   B|          3213.0|                PORT|          3213.0|                DOCK|            null|                null|
+---+----+----------------+--------------------+----------------+--------------------+----------------+--------------------+

但是我为这些列得到了这些非常复杂的名称.在聚合上应用alias通常是可行的，但是由于pivot在这种情况下，名称甚至更糟:

But I get these really complicated names for the columns. Applying alias on the aggregation usually works, but because of the pivot in this case the names are even worse:

+---+----+--------------------------------------------------------------+------------------------------------------------------------------+--------------------------------------------------------------+------------------------------------------------------------------+--------------------------------------------------------------+------------------------------------------------------------------+
| id|type|201601_(avg(cost),mode=Complete,isDistinct=false) AS cost#1619|201601_(first(ship)(),mode=Complete,isDistinct=false) AS ship#1620|201602_(avg(cost),mode=Complete,isDistinct=false) AS cost#1619|201602_(first(ship)(),mode=Complete,isDistinct=false) AS ship#1620|201603_(avg(cost),mode=Complete,isDistinct=false) AS cost#1619|201603_(first(ship)(),mode=Complete,isDistinct=false) AS ship#1620|
+---+----+--------------------------------------------------------------+------------------------------------------------------------------+--------------------------------------------------------------+------------------------------------------------------------------+--------------------------------------------------------------+------------------------------------------------------------------+
|  2|   C|                                                        2321.0|                                                              DOCK|                                                          null|                                                              null|                                                          null|                                                              null|
|  0|   A|                                                         422.0|                                                              DOCK|                                                          22.0|                                                              PORT|                                                         223.0|                                                              PORT|
|  1|   B|                                                        3213.0|                                                              PORT|                                                        3213.0|                                                              DOCK|                                                          null|                                                              null|
+---+----+--------------------------------------------------------------+------------------------------------------------------------------+--------------------------------------------------------------+------------------------------------------------------------------+--------------------------------------------------------------+------------------------------------------------------------------+

是否有一种方法可以在数据透视表和集合上动态重命名列名?

Is there a way to rename the column names on the fly on the pivot and aggregation?

重命名PySpark Dataframe中的透视和聚合列 [英] Rename pivoted and aggregated column in PySpark Dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

重命名PySpark Dataframe中的透视和聚合列 [英] Rename pivoted and aggregated column in PySpark Dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭