Spark Read Json:如何读取在整数和结构之间交替的字段 [英] Spark Read Json: how to read field that alternates between integer and struct

查看:86
本文介绍了Spark Read Json:如何读取在整数和结构之间交替的字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试将多个json文件读入数据帧,两个文件都有一个值节点,但是此节点的类型在整数和结构之间交替:

Trying to read multiple json files into a dataframe, both files have a "Value" node but the type of this node alternates between integer and struct:

文件1 :

{
   "Value": 123
}

文件2:

{
   "Value": {
      "Value": "On",
      "ValueType": "State",
      "IsSystemValue": true
   }
}

我的目标是将文件读入这样的数据框中:

My goal is to read the files into a dataframe like this:

|---------------------|------------------|---------------------|------------------|
|         File        |       Value      |      ValueType      |   IsSystemValue  |
|---------------------|------------------|---------------------|------------------|
|      File1.json     |        123       |        null         |       null       |
|---------------------|------------------|---------------------|------------------|
|      File2.json     |        On        |        State        |       true       |
|---------------------|------------------|---------------------|------------------|

有可能所有读取的文件都像FileA,而没有像FileB,反之亦然,或两者兼而有之。事先不知道。有想法吗?

There is a possibility that all of the files read are like FileA and none like FileB, vice verse, or a combination of both. Its not known ahead of time. Any Ideas??

推荐答案

尝试一下是否有帮助-

    /**
      * test/File1.json
      * -----
      * {
      * "Value": 123
      * }
      */
    /**
      * test/File2.json
      * ---------
      * {
      * "Value": {
      * "Value": "On",
      * "ValueType": "State",
      * "IsSystemValue": true
      * }
      * }
      */
    val path = getClass.getResource("/test" ).getPath
    val df = spark.read
      .option("multiLine", true)
      .json(path)

    df.show(false)
    df.printSchema()

    /**
      * +-------------------------------------------------------+
      * |Value                                                  |
      * +-------------------------------------------------------+
      * |{"Value":"On","ValueType":"State","IsSystemValue":true}|
      * |123                                                    |
      * +-------------------------------------------------------+
      *
      * root
      * |-- Value: string (nullable = true)
      */



转换字符串json



Transform string json

    df.withColumn("File", substring_index(input_file_name(),"/", -1))
      .withColumn("ValueType", get_json_object(col("Value"), "$.ValueType"))
      .withColumn("IsSystemValue", get_json_object(col("Value"), "$.IsSystemValue"))
      .withColumn("Value", coalesce(get_json_object(col("Value"), "$.Value"), col("Value")))
      .show(false)

    /**
      * +-----+----------+---------+-------------+
      * |Value|File      |ValueType|IsSystemValue|
      * +-----+----------+---------+-------------+
      * |On   |File2.json|State    |true         |
      * |123  |File1.json|null     |null         |
      * +-----+----------+---------+-------------+
      */

这篇关于Spark Read Json:如何读取在整数和结构之间交替的字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆