将Pyspark数据框中的字典拆分为单独的列 [英] Splitting a dictionary in a Pyspark dataframe into individual columns

查看:484
本文介绍了将Pyspark数据框中的字典拆分为单独的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框(在Pyspark中),其中有一个行值作为字典:

I have a dataframe (in Pyspark) that has one of the row values as a dictionary:

df.show()

它看起来像:

+----+---+-----------------------------+
|name|age|info                         |
+----+---+-----------------------------+
|rob |26 |{color: red, car: volkswagen}|
|evan|25 |{color: blue, car: mazda}    |
+----+---+-----------------------------+

根据评论提供更多信息:

Based on the comments to give more:

df.printSchema()

类型是字符串

root
 |-- name: string (nullable = true)
 |-- age: string (nullable = true)
 |-- dict: string (nullable = true)

是否可以从字典(颜色和汽车)中获取键并在数据框中将它们设置为列,并将值作为那些列的行?

Is it possible to take the keys from the dictionary (color and car) and make them columns in the dataframe, and have the values be the rows for those columns?

预期结果:

+----+---+-----------------------------+
|name|age|color |car                   |
+----+---+-----------------------------+
|rob |26 |red   |volkswagen            |
|evan|25 |blue  |mazda                 |
+----+---+-----------------------------+

我不知道我必须使用df.withColumn()并以某种方式遍历字典以选择每个字典,然后在其中创建一列吗?到目前为止,我一直试图找到一些答案,但是大多数答案是使用Pandas而不是Spark,因此我不确定是否可以应用相同的逻辑.

I didn't know I had to use df.withColumn() and somehow iterate through the dictionary to pick each one and then make a column out of it? I've tried to find some answers so far, but most were using Pandas, and not Spark, so I'm not sure if I can apply the same logic.

推荐答案

Spark data_frame colum_name是info,下面是输入字符串,它是info列的值:

Spark data_frame colum_name is info and below is input string which is a value of info column:

input_value is :-"[{Charge_Power:2.3, EVSE_PhaseAmp:10, charging_id:230V10A1X}, {Charge_Power:3.7, EVSE_PhaseAmp:16, charging_id:230V16A1X}]"

预期输出:

#+------------+-------------+-----------+
#|Charge_Power|EVSE_PhaseAmp|charging_id|
#+------------+-------------+-----------+
#|2.3         |10           |230V10A1X  |
#|3.7         |16           |230V16A1X  |
#+------------+-------------+-----------+

这篇关于将Pyspark数据框中的字典拆分为单独的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆