如何将df列的每个元素转换为不同的列? [英] How can i convert each element of a column of df to a different column?
本文介绍了如何将df列的每个元素转换为不同的列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
请假设我在Pyspark上有这样的数据框;
please assume that I have a data frame on Pyspark like this;
import pandas
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
spark = SparkSession \
.Builder() \
.appName('stackoverflow') \
.getOrCreate()
data = {
'location_id': [1, 2, 3],
'product_model_features': [
[{'key': 'A', 'value': 'B'}, {'key': 'C', 'value': 'D'}, {'key': 'E', 'value': 'F'}],
[{'key': 'A', 'value': 'H'}, {'key': 'E', 'value': 'J'}],
[{'key': 'C', 'value': 'N'}, {'key': 'E', 'value': 'P'}]
]
}
df = pandas.DataFrame(data)
df = spark.createDataFrame(df)
df = df.withColumn('p', explode('product_model_features')) \
.select('location_id', 'p.key', 'p.value')
df.show()
输出为
+-----------+---+-----+
|location_id|key|value|
+-----------+---+-----+
| 1| A| B|
| 1| C| D|
| 1| E| F|
| 2| A| H|
| 2| E| J|
| 3| C| N|
| 3| E| P|
+-----------+---+-----+
我想将键列值转换为带有值的其他列。在下面,您可以看到输出内容。如果您对pyspark有任何想法,请告诉我
I want to convert column "key" values to a different column with values. Below you can see what output looks like. please let me know if you have idea on pyspark
+-----------+----+----+-+
|location_id|A |C |E|
+-----------+----+----+-+
| 1|B |D |F|
| 2|H |Null|J|
| 3|Null|N |P|
+-----------+----+----+-+
推荐答案
您正在寻找 pivot()
函数来转换数据框。
You're looking for pivot()
function to transform your dataframe.
import pandas
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode, col, first
spark = SparkSession \
.Builder() \
.appName('stackoverflow') \
.getOrCreate()
data = {
'location_id': [1, 2, 3],
'product_model_features': [
[{'key': 'A', 'value': 'B'}, {'key': 'C', 'value': 'D'}, {'key': 'E', 'value': 'F'}],
[{'key': 'A', 'value': 'H'}, {'key': 'E', 'value': 'J'}],
[{'key': 'C', 'value': 'N'}, {'key': 'E', 'value': 'P'}]
]
}
df = pandas.DataFrame(data)
df = spark.createDataFrame(df)
df = df \
.withColumn('p', explode('product_model_features')) \
.select('location_id', 'p.key', 'p.value')
df = df \
.groupby('location_id') \
.pivot('key') \
.agg(first('value')) \
.sort('location_id')
df.show()
输出:
+-----------+----+----+---+
|location_id| A| C| E|
+-----------+----+----+---+
| 1| B| D| F|
| 2| H|null| J|
| 3|null| N| P|
+-----------+----+----+---+
这篇关于如何将df列的每个元素转换为不同的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文