将嵌套的字典键值转换为pyspark数据框 [英] Transform nested dictionary key values to pyspark dataframe
本文介绍了将嵌套的字典键值转换为pyspark数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个如下所示的Pyspark数据框:
I have a Pyspark dataframe that looks like this:
我会像在 dic中提取那些嵌套字典。列并将其转换为PySpark数据框。像这样:
I would like extract those nested dictionaries in the "dic" column and transform them into PySpark dataframe. Like this:
请让我知道如何实现这一目标。
Please let me know how I can achieve this.
谢谢!
推荐答案
from pyspark.sql import functions as F
df.show() #sample dataframe
+---------+----------------------------------------------------------------------------------------------------------+
|timestmap|dic |
+---------+----------------------------------------------------------------------------------------------------------+
|timestamp|{"Name":"David","Age":"25","Location":"New York","Height":"170","fields":{"Color":"Blue","Shape":"round"}}|
+---------+----------------------------------------------------------------------------------------------------------+
对于 Spark2.4 +
,则可以使用 from_json
和 schema_of_json
。
For Spark2.4+
, you could use from_json
and schema_of_json
.
schema=df.select(F.schema_of_json(df.select("dic").first()[0])).first()[0]
df.withColumn("dic", F.from_json("dic", schema))\
.selectExpr("dic.*").selectExpr("*","fields.*").drop("fields").show()
#+---+------+--------+-----+-----+-----+
#|Age|Height|Location| Name|Color|Shape|
#+---+------+--------+-----+-----+-----+
#| 25| 170|New York|David| Blue|round|
#+---+------+--------+-----+-----+-----+
您也可以将 rdd
方式与 read.json
(如果您没有 spark2.4
)。 df到rdd
的转换会受到性能影响。
You could also use rdd
way with read.json
if you don't have spark2.4
. There will be performance hit of df to rdd
conversion.
df1 = spark.read.json(df.rdd.map(lambda r: r.dic))\
df1.select(*[x for x in df1.columns if x!='fields'], F.col("fields.*")).show()
#+---+------+--------+-----+-----+-----+
#|Age|Height|Location| Name|Color|Shape|
#+---+------+--------+-----+-----+-----+
#| 25| 170|New York|David| Blue|round|
#+---+------+--------+-----+-----+-----+
这篇关于将嵌套的字典键值转换为pyspark数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文