将嵌套的JSON解析为STRUCT类型的BQ表 [英] Parsing Nested JSON into STRUCT type BQ table

查看:91
本文介绍了将嵌套的JSON解析为STRUCT类型的BQ表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将以下数据加载到BQ中以创建STRUCT类型表.我正在使用上载选项和BigQuery Web UI上的自动检测架构来上载文件.

I am trying to load following data into BQ to create STRUCT type table. I am uploading the file using Upload option with Auto detect schema on BigQuery web UI.

{"property": [
    {
      "NAME": "65874aca2143",
      "VALUE": [
        {
          "NAME": "time",
          "VALUE": [
            {
              "NAME": "$date",
              "VALUE": "2020-06-16T09:42:49.449Z"
            }
          ]
        },
        {
          "NAME": "type",
          "VALUE": "ACTION"
        },
        {
          "NAME": "id",
          "VALUE": "1234"
        }
      ]
    }
  ]}

但这给了我下面的错误.

But it is giving me below error.

Error while reading data, error message: Failed to parse JSON: No active field found.; ParsedString returned false; Could not parse value; Could not parse value; Could not parse value; Could not parse value; Could not parse value; Parser terminated before end of string.

我的数据有问题还是违反了任何BQ规则?

Is anything wrong with my data or am i violating any BQ rules?

推荐答案

看看有关

  • BigQuery使用 JSON行结构来加载JSON数据,其中每个JSON对象必须是一个单独的新行.您需要将JSON对象格式化为一行:
    1. BigQuery uses JSON Lines structure to load JSON data, in which each JSON object must be a separate new line. You need to format your JSON object into a single line:

    { "property":[ { "NAME":"65874aca2143", "VALUE":[ { "NAME":"time", "VALUE":[ { "NAME":"$date", "VALUE":"2020-06-16T09:42:49.449Z" } ] }, { "NAME":"type", "VALUE":"ACTION" }, { "NAME":"id", "VALUE":"1234" } ] } ] }
    

    1. 但是,这仍然会引发错误.BigQuery尝试从提供的文件中自动推断模式;数组/列表被视为重复记录(STRUCT)类型.因此,期望为阵列/列表的所有元素找到相同的模式结构,而不是这种"VALUE"的情况.数组:

    1. However, that will still throw an error. BigQuery tries to auto infer the schema from the provided file; arrays/lists get treated as REPEATED RECORD (STRUCT) types. Hence, it's expecting to find the same schema structure for all the elements of the array/list, which is not the case for this "VALUE" array:

    "VALUE": [
        //this first element has different schema:
        {
          "NAME": "time",
          "VALUE": [
            {
              "NAME": "$date",
              "VALUE": "2020-06-16T09:42:49.449Z"
            }
          ]
        },
        {
          "NAME": "type",
          "VALUE": "ACTION"
        },
        {
          "NAME": "id",
          "VALUE": "1234"
        }
      ]
    

    例如,如果将其更改为:

    If, for example, this was changed into:

    "VALUE":[
                {
                   "NAME":"time",
                   "VALUE": "2020-06-16T09:42:49.449Z"
                },
                {
                   "NAME":"type",
                   "VALUE":"ACTION"
                },
                {
                   "NAME":"id",
                   "VALUE":"1234"
                }
             ]
    

    它将起作用(当然,将其格式化为一行之后).因此,您还需要重组数据,以使REPEATED数据的所有元素具有相同的架构.

    It will work (of course, after formatting it into a single line). So, you also need to restructure your data to have the same schema on all the elements of REPEATED data.

    您还可以考虑将整个JSON对象存储到单个STRING列中,然后使用

    You can also consider the option of storing the entire JSON object into a single STRING column, and then querying its elements using BigQuery JSON functions.

    这篇关于将嵌套的JSON解析为STRUCT类型的BQ表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆