Pyspark:将列中的 json 分解为多列 [英] Pyspark: explode json in column to multiple columns

查看：40 发布时间：2021/11/12 5:29:31 python apache-spark pyspark apache-spark-sql

本文介绍了Pyspark:将列中的 json 分解为多列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

数据看起来像这样 -

The data looks like this -

+-----------+-----------+-----------------------------+
|         id|      point|                         data|
+-----------------------------------------------------+
|        abc|          6|{"key1":"124", "key2": "345"}|
|        dfl|          7|{"key1":"777", "key2": "888"}|
|        4bd|          6|{"key1":"111", "key2": "788"}|

我正在尝试将其分解为以下格式.

I am trying to break it into the following format.

+-----------+-----------+-----------+-----------+
|         id|      point|       key1|       key2|
+------------------------------------------------
|        abc|          6|        124|        345|
|        dfl|          7|        777|        888|
|        4bd|          6|        111|        788|

explode 函数将数据帧分解为多行.但这不是理想的解决方案.

The explode function explodes the dataframe into multiple rows. But that is not the desired solution.

注意:此解决方案无法回答我的问题.PySpark爆炸"列中的字典

Note: This solution does not answers my questions. PySpark "explode" dict in column

推荐答案

只要您使用的是 Spark 2.1 或更高版本，pyspark.sql.functions.from_json 应该得到你想要的结果，但你需要先定义所需的schema


As long as you are using Spark version 2.1 or higher, pyspark.sql.functions.from_json should get you your desired result, but you would need to first define the required schema
from pyspark.sql.functions import from_json, col
from pyspark.sql.types import StructType, StructField, StringType

schema = StructType(
    [
        StructField('key1', StringType(), True),
        StructField('key2', StringType(), True)
    ]
)

df.withColumn("data", from_json("data", schema))\
    .select(col('id'), col('point'), col('data.*'))\
    .show()

应该给你
+---+-----+----+----+
| id|point|key1|key2|
+---+-----+----+----+
|abc|    6| 124| 345|
|df1|    7| 777| 888|
|4bd|    6| 111| 788|
+---+-----+----+----+


                        这篇关于Pyspark:将列中的 json 分解为多列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Pyspark:将列中的 json 分解为多列 [英] Pyspark: explode json in column to multiple columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pyspark:将列中的 json 分解为多列 [英] Pyspark: explode json in column to multiple columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭