如何将df列的每个元素转换为不同的列? [英] How can i convert each element of a column of df to a different column?

查看:48
本文介绍了如何将df列的每个元素转换为不同的列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请假设我在Pyspark上有这样的数据框;

please assume that I have a data frame on Pyspark like this;

import pandas
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode

spark = SparkSession \
    .Builder() \
    .appName('stackoverflow') \
    .getOrCreate()

data = {
    'location_id': [1, 2, 3],
    'product_model_features': [
        [{'key': 'A', 'value': 'B'}, {'key': 'C', 'value': 'D'}, {'key': 'E', 'value': 'F'}],
        [{'key': 'A', 'value': 'H'}, {'key': 'E', 'value': 'J'}],
        [{'key': 'C', 'value': 'N'}, {'key': 'E', 'value': 'P'}]
    ]
}
df = pandas.DataFrame(data)
df = spark.createDataFrame(df)
df = df.withColumn('p', explode('product_model_features')) \
    .select('location_id', 'p.key', 'p.value')
df.show()

输出为

 +-----------+---+-----+
 |location_id|key|value|
 +-----------+---+-----+
 |          1|  A|    B|
 |          1|  C|    D|
 |          1|  E|    F|
 |          2|  A|    H|
 |          2|  E|    J|
 |          3|  C|    N|
 |          3|  E|    P|
 +-----------+---+-----+

我想将键列值转换为带有值的其他列。在下面,您可以看到输出内容。如果您对pyspark有任何想法,请告诉我

I want to convert column "key" values to a different column with values. Below you can see what output looks like. please let me know if you have idea on pyspark

 +-----------+----+----+-+
 |location_id|A   |C   |E|
 +-----------+----+----+-+
 |          1|B   |D   |F|
 |          2|H   |Null|J|
 |          3|Null|N   |P|
 +-----------+----+----+-+


推荐答案

您正在寻找 pivot()函数来转换数据框。

You're looking for pivot() function to transform your dataframe.

import pandas
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode, col, first

spark = SparkSession \
    .Builder() \
    .appName('stackoverflow') \
    .getOrCreate()

data = {
    'location_id': [1, 2, 3],
    'product_model_features': [
        [{'key': 'A', 'value': 'B'}, {'key': 'C', 'value': 'D'}, {'key': 'E', 'value': 'F'}],
        [{'key': 'A', 'value': 'H'}, {'key': 'E', 'value': 'J'}],
        [{'key': 'C', 'value': 'N'}, {'key': 'E', 'value': 'P'}]
    ]
}
df = pandas.DataFrame(data)
df = spark.createDataFrame(df)
df = df \
    .withColumn('p', explode('product_model_features')) \
    .select('location_id', 'p.key', 'p.value')

df = df \
    .groupby('location_id') \
    .pivot('key') \
    .agg(first('value')) \
    .sort('location_id')
df.show()

输出:

+-----------+----+----+---+
|location_id|   A|   C|  E|
+-----------+----+----+---+
|          1|   B|   D|  F|
|          2|   H|null|  J|
|          3|null|   N|  P|
+-----------+----+----+---+

这篇关于如何将df列的每个元素转换为不同的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆