Pyspark:将列中的 json 分解为多列 [英] Pyspark: explode json in column to multiple columns
本文介绍了Pyspark:将列中的 json 分解为多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
数据看起来像这样 -
The data looks like this -
+-----------+-----------+-----------------------------+
| id| point| data|
+-----------------------------------------------------+
| abc| 6|{"key1":"124", "key2": "345"}|
| dfl| 7|{"key1":"777", "key2": "888"}|
| 4bd| 6|{"key1":"111", "key2": "788"}|
我正在尝试将其分解为以下格式.
I am trying to break it into the following format.
+-----------+-----------+-----------+-----------+
| id| point| key1| key2|
+------------------------------------------------
| abc| 6| 124| 345|
| dfl| 7| 777| 888|
| 4bd| 6| 111| 788|
explode
函数将数据帧分解为多行.但这不是理想的解决方案.
The explode
function explodes the dataframe into multiple rows. But that is not the desired solution.
注意:此解决方案无法回答我的问题.PySpark爆炸"列中的字典
Note: This solution does not answers my questions. PySpark "explode" dict in column
推荐答案
只要您使用的是 Spark 2.1 或更高版本,pyspark.sql.functions.from_json
应该得到你想要的结果,但你需要先定义所需的schema代码>
As long as you are using Spark version 2.1 or higher, pyspark.sql.functions.from_json
should get you your desired result, but you would need to first define the required schema
from pyspark.sql.functions import from_json, col
from pyspark.sql.types import StructType, StructField, StringType
schema = StructType(
[
StructField('key1', StringType(), True),
StructField('key2', StringType(), True)
]
)
df.withColumn("data", from_json("data", schema))\
.select(col('id'), col('point'), col('data.*'))\
.show()
应该给你
+---+-----+----+----+
| id|point|key1|key2|
+---+-----+----+----+
|abc| 6| 124| 345|
|df1| 7| 777| 888|
|4bd| 6| 111| 788|
+---+-----+----+----+
这篇关于Pyspark:将列中的 json 分解为多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文