将JSON文件加载到BigQuery表中时如何管理/处理架构更改 [英] How to manage/handle schema changes while loading JSON file into BigQuery table

本文介绍了将JSON文件加载到BigQuery表中时如何管理/处理架构更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的输入文件的样子:

Here is how my input file looks like:

{"Id": 1, "Address": {"Street":"MG Road","City":"Pune"}}
{"Id": 2, "Address": {"City":"Mumbai"}}
{"Id": 3, "Address": {"Street":"XYZ Road"}}
{"Id": 4}
{"Id": 5, "PhoneNumber": 12345678, "Address": {"Street":"ABCD Road", "City":"Bangalore"}}

在我的数据流管道中,我如何动态确定每一行中存在哪些字段,以便遵守BigQuery表架构. 例如,在第2行中,缺少Street.我希望BigQuery中列Address.Street的条目为"N/A"null,并且不希望由于架构更改或数据丢失而导致管道失败.

In my dataflow pipeline, How I can I dynamically determine which fields are present in each row in order to adhere to the BigQuery table schema. e.g., In row #2, Street is missing. I want the entry for column Address.Street in the BigQuery to be "N/A" or null and don't want pipeline to fail because of schema change or missing data.

在用Python写入BigQuery之前,如何在数据流作业中处理此逻辑?

How can I handle this logic in my dataflow job before writing to BigQuery in Python?

推荐答案

我建议仅使用类型为string

将数据导入BigQuery临时表后-现在,您可以应用架构逻辑并将数据从临时表中查询到最终表

After you done with bringing your data to BigQuery temp table - now you can apply schema logic and query your data out of temp table to your final table

下面的示例适用于BigQuery标准SQL,该示例如何将模式逻辑应用于一个字段中整行的表

Below example is for BigQuery Standard SQL of how to apply schema logic against table with whole row in one field

#standardSQL
WITH t AS (
  SELECT '{"Id": 1, "Address": {"Street":"MG Road","City":"Pune"}}' line UNION ALL
  SELECT '{"Id": 2, "Address": {"City":"Mumbai"}}' UNION ALL
  SELECT '{"Id": 3, "Address": {"Street":"XYZ Road"}}' UNION ALL
  SELECT '{"Id": 4}  ' UNION ALL
  SELECT '{"Id": 5, "PhoneNumber": 12345678, "Address": {"Street":"ABCD Road", "City":"Bangalore"}}' 
)
SELECT
  JSON_EXTRACT_SCALAR(line, '$.Id') id,
  JSON_EXTRACT_SCALAR(line, '$.PhoneNumber') PhoneNumber,
  JSON_EXTRACT_SCALAR(line, '$[Address].Street') Street,
  JSON_EXTRACT_SCALAR(line, '$[Address].City') City 
FROM t  

结果如下

Row id  PhoneNumber Street      City     
1   1   null        MG Road     Pune     
2   2   null        null        Mumbai   
3   3   null        XYZ Road    null     
4   4   null        null        null     
5   5   12345678    ABCD Road   Bangalore      

这篇关于将JSON文件加载到BigQuery表中时如何管理/处理架构更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆