BigQuery:创建JSON数据类型的列 [英] BigQuery: Create column of JSON datatype

查看:84
本文介绍了BigQuery:创建JSON数据类型的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  {
key_a:value_a,$

我试图将以下模式的json加载到BigQuery中: b $ b key_b:{
key_c:value_c,
key_d:value_d
}
key_e:{
key_f:value_f,
key_g:value_g






key_e下的键是动态的,即在一个响应中key_e将会包含key_f和key_g,而对于另一个响应,它将包含key_h和key_i。新密钥可以随时创建,所以我无法为所有可能的密钥创建带有空字段的记录。

相反,我想创建一个JSON数据类型的列,然后可以使用JSON_EXTRACT()函数来查询。我已经尝试使用STRING数据类型将key_e作为列加载,但value_e被分析为JSON,因此失败。



如何将一段JSON加载到单个BigQuery列有没有JSON数据类型?

解决方案

如果您的数据量很大,这可能会导致查询价格高昂,因为所有数据都会以一列结尾,实际查询逻辑会变得相当混乱。



如果您对设计有轻微改变的奢侈品 - 我会建议考虑低于一个 - 在这里您可以使用REPEATED模式



表架构:

  [
{name:key_a,
type:STRING},
{name:key_b,
type:RECORD,
mode:REPEATED,
字段:[
{name:key,
type:STRING},
{name:value,
type :STRING}
]
},
{name:key_e,
type:RECORD,
模式: REPEATED,
fields:[
{name:key,
type:STRING},
{name:value ,
type:STRING}
]
}
]

加载JSON的示例

  {key_a:value_a1,key_b: [{key:key_c,value:value_c},{key:key_d,value:value_d}],key_e:[{key:key_f ,价值:va lue_f},{key:key_g,value:value_g}]} 
{key_a:value_a2,key_b:[{key:key_x, value:value_x},{key:key_y,value:value_y}],key_e:[{key:key_h,value:value_h} ,{key:key_i,value:value_i}]}

注意:它应该是换行符分隔的JSON,因此每行必须在一行上


I am trying to load json with the following schema into BigQuery:

{
key_a:value_a,
key_b:{
   key_c:value_c,
   key_d:value_d
  }
key_e:{
   key_f:value_f,
   key_g:value_g
  }
}

The keys under key_e are dynamic, ie in one response key_e will contain key_f and key_g and for another response it will instead contain key_h and key_i. New keys can be created at any time so I cannot create a record with nullable fields for all possible keys.

Instead I want to create a column with JSON datatype that can then be queried using the JSON_EXTRACT() function. I have tried loading key_e as a column with STRING datatype but value_e is analysed as JSON and so fails.

How can I load a section of JSON into a single BigQuery column when there is no JSON datatype?

解决方案

Having your JSON as a single string column inside BigQuery is definitelly an option. If you have large volume of data this can end up with high query price as all your data will end up in one column and actually querying logic can become quite messy.

If you have luxury of slightly changing your "design" - I would recommend considering below one - here you can employ REPEATED mode

Table schema:

[
  { "name": "key_a",
    "type": "STRING" },
  { "name": "key_b",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
      { "name": "key",
        "type": "STRING"},
      { "name": "value",
        "type": "STRING"}
    ]
  },
  { "name": "key_e",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
      { "name": "key",
        "type": "STRING"},
      { "name": "value",
        "type": "STRING"}
    ]
  }
]

Example of JSON to load

{"key_a": "value_a1", "key_b": [{"key": "key_c", "value": "value_c"}, {"key": "key_d", "value": "value_d"}], "key_e": [{"key": "key_f", "value": "value_f"}, {"key": "key_g", "value": "value_g"}]}
{"key_a": "value_a2", "key_b": [{"key": "key_x", "value": "value_x"}, {"key": "key_y", "value": "value_y"}], "key_e": [{"key": "key_h", "value": "value_h"}, {"key": "key_i", "value": "value_i"}]}

Please note: it should be newline delimited JSON so each row must be on one line

这篇关于BigQuery:创建JSON数据类型的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆