将Pyspark数据框中的字典拆分为单独的列 [英] Splitting a dictionary in a Pyspark dataframe into individual columns
问题描述
我有一个数据框(在Pyspark中),其中有一个行值作为字典:
I have a dataframe (in Pyspark) that has one of the row values as a dictionary:
df.show()
它看起来像:
+----+---+-----------------------------+
|name|age|info |
+----+---+-----------------------------+
|rob |26 |{color: red, car: volkswagen}|
|evan|25 |{color: blue, car: mazda} |
+----+---+-----------------------------+
根据评论提供更多信息:
Based on the comments to give more:
df.printSchema()
类型是字符串
root
|-- name: string (nullable = true)
|-- age: string (nullable = true)
|-- dict: string (nullable = true)
是否可以从字典(颜色和汽车)中获取键并在数据框中将它们设置为列,并将值作为那些列的行?
Is it possible to take the keys from the dictionary (color and car) and make them columns in the dataframe, and have the values be the rows for those columns?
预期结果:
+----+---+-----------------------------+
|name|age|color |car |
+----+---+-----------------------------+
|rob |26 |red |volkswagen |
|evan|25 |blue |mazda |
+----+---+-----------------------------+
我不知道我必须使用df.withColumn()并以某种方式遍历字典以选择每个字典,然后在其中创建一列吗?到目前为止,我一直试图找到一些答案,但是大多数答案是使用Pandas而不是Spark,因此我不确定是否可以应用相同的逻辑.
I didn't know I had to use df.withColumn() and somehow iterate through the dictionary to pick each one and then make a column out of it? I've tried to find some answers so far, but most were using Pandas, and not Spark, so I'm not sure if I can apply the same logic.
推荐答案
Spark data_frame
colum_name
是info,下面是输入字符串,它是info列的值:
Spark data_frame
colum_name
is info and below is input string which is a value of info column:
input_value is :-"[{Charge_Power:2.3, EVSE_PhaseAmp:10, charging_id:230V10A1X}, {Charge_Power:3.7, EVSE_PhaseAmp:16, charging_id:230V16A1X}]"
预期输出:
#+------------+-------------+-----------+
#|Charge_Power|EVSE_PhaseAmp|charging_id|
#+------------+-------------+-----------+
#|2.3 |10 |230V10A1X |
#|3.7 |16 |230V16A1X |
#+------------+-------------+-----------+
这篇关于将Pyspark数据框中的字典拆分为单独的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!