如何在scala中将嵌套的JSON转换为映射对象 [英] How to convert nested JSON to map object in scala

查看:163
本文介绍了如何在scala中将嵌套的JSON转换为映射对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下JSON对象:

I have the following JSON objects:

{
    "user_id": "123",
    "data": {
        "city": "New York"
    },
    "timestamp": "1563188698.31",
    "session_id": "6a793439-6535-4162-b333-647a6761636b"
}
{
    "user_id": "123",
    "data": {
        "name": "some_name",
        "age": "23",
        "occupation": "teacher"
    },
    "timestamp": "1563188698.31",
    "session_id": "6a793439-6535-4162-b333-647a6761636b"
}

我正在使用val df = sqlContext.read.json("json")将文件读取到数据框

I'm using val df = sqlContext.read.json("json") to read the file to dataframe

将所有数据属性组合到数据结构中,如下所示:

Which combines all data attributes into data struct like so:

root
 |-- data: struct (nullable = true)
 |    |-- age: string (nullable = true)
 |    |-- city: string (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- occupation: string (nullable = true)
 |-- session_id: string (nullable = true)
 |-- timestamp: string (nullable = true)
 |-- user_id: string (nullable = true)

是否可以将数据字段转换为MAP [String,String]数据类型?因此,它只具有与原始json相同的属性?

Is it possible to transform data field to MAP[String, String] Data type? And so it only has the same attributes as original json?

推荐答案

是的,您可以通过从JSON数据中导出Map [String,String]来实现这一目标,如下所示:

Yes you can achieve that by exporting a Map[String, String] from the JSON data as shown next:

import org.apache.spark.sql.types.{MapType, StringType}
import org.apache.spark.sql.functions.{to_json, from_json}

val jsonStr = """{
    "user_id": "123",
    "data": {
        "name": "some_name",
        "age": "23",
        "occupation": "teacher"
    },
    "timestamp": "1563188698.31",
    "session_id": "6a793439-6535-4162-b333-647a6761636b"
}"""

val df = spark.read.json(Seq(jsonStr).toDS)

val mappingSchema = MapType(StringType, StringType)

df.select(from_json(to_json($"data"), mappingSchema).as("map_data"))

//Output
// +-----------------------------------------------------+
// |map_data                                             |
// +-----------------------------------------------------+
// |[age -> 23, name -> some_name, occupation -> teacher]|
// +-----------------------------------------------------+

首先,我们将data字段的内容提取为带有to_json($"data")的字符串,然后解析并使用from_json(to_json($"data"), schema)提取Map.

First we extract the content of the data field into a string with to_json($"data"), then we parse and extract the Map with from_json(to_json($"data"), schema).

这篇关于如何在scala中将嵌套的JSON转换为映射对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆