BigQuery:创建 JSON 数据类型的列 [英] BigQuery: Create column of JSON datatype

查看:32
本文介绍了BigQuery:创建 JSON 数据类型的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将具有以下架构的 json 加载到 BigQuery 中:

<代码>{键_a:值_a,键_b:{键_c:值_c,键_d:值_d}键_e:{键_f:值_f,键_g:值_g}}

key_e 下的键是动态的,即在一个响应中,key_e 将包含 key_f 和 key_g,而对于另一个响应,它将包含 key_h 和 key_i.可以随时创建新键,因此我无法为所有可能的键创建包含可为空字段的记录.

相反,我想创建一个具有 JSON 数据类型的列,然后可以使用 JSON_EXTRACT() 函数查询该列.我尝试将 key_e 作为数据类型为 STRING 的列加载,但 value_e 被分析为 JSON,因此失败.

如果没有 JSON 数据类型,如何将一部分 JSON 加载到单个 BigQuery 列中?

解决方案

在 BigQuery 中将 JSON 作为单个字符串列绝对是一种选择.如果您有大量数据,这可能会导致高查询价格,因为您的所有数据最终都会在一列中,并且实际查询逻辑可能会变得非常混乱.

如果你可以稍微改变你的设计" - 我建议考虑下面的一个 - 在这里你可以使用重复模式

表架构:

<预><代码>[{名称":key_a","type": "STRING" },{名称":key_b","类型": "记录",模式":重复",领域":[{名称":密钥","type": "STRING"},{名称":值",类型":字符串"}]},{名称":key_e","类型": "记录",模式":重复",领域":[{名称":密钥","type": "STRING"},{名称":值",类型":字符串"}]}]

要加载的 JSON 示例

{"key_a": "value_a1", "key_b": [{"key": "key_c", "value": "value_c"}, {"key": "key_d", "value": "value_d"}], "key_e": [{"key": "key_f", "value": "value_f"}, {"key": "key_g", "value": "value_g"}]}{"key_a": "value_a2", "key_b": [{"key": "key_x", "value": "value_x"}, {"key": "key_y", "value": "value_y"}], "key_e": [{"key": "key_h", "value": "value_h"}, {"key": "key_i", "value": "value_i"}]}

请注意:它应该是换行符分隔的JSON,所以每一行必须在一行上

I am trying to load json with the following schema into BigQuery:

{
key_a:value_a,
key_b:{
   key_c:value_c,
   key_d:value_d
  }
key_e:{
   key_f:value_f,
   key_g:value_g
  }
}

The keys under key_e are dynamic, ie in one response key_e will contain key_f and key_g and for another response it will instead contain key_h and key_i. New keys can be created at any time so I cannot create a record with nullable fields for all possible keys.

Instead I want to create a column with JSON datatype that can then be queried using the JSON_EXTRACT() function. I have tried loading key_e as a column with STRING datatype but value_e is analysed as JSON and so fails.

How can I load a section of JSON into a single BigQuery column when there is no JSON datatype?

解决方案

Having your JSON as a single string column inside BigQuery is definitelly an option. If you have large volume of data this can end up with high query price as all your data will end up in one column and actually querying logic can become quite messy.

If you have luxury of slightly changing your "design" - I would recommend considering below one - here you can employ REPEATED mode

Table schema:

[
  { "name": "key_a",
    "type": "STRING" },
  { "name": "key_b",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
      { "name": "key",
        "type": "STRING"},
      { "name": "value",
        "type": "STRING"}
    ]
  },
  { "name": "key_e",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
      { "name": "key",
        "type": "STRING"},
      { "name": "value",
        "type": "STRING"}
    ]
  }
]

Example of JSON to load

{"key_a": "value_a1", "key_b": [{"key": "key_c", "value": "value_c"}, {"key": "key_d", "value": "value_d"}], "key_e": [{"key": "key_f", "value": "value_f"}, {"key": "key_g", "value": "value_g"}]}
{"key_a": "value_a2", "key_b": [{"key": "key_x", "value": "value_x"}, {"key": "key_y", "value": "value_y"}], "key_e": [{"key": "key_h", "value": "value_h"}, {"key": "key_i", "value": "value_i"}]}

Please note: it should be newline delimited JSON so each row must be on one line

这篇关于BigQuery:创建 JSON 数据类型的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆