如何将非结构化数据插入/附加到 bigquery 表 [英] How to insert/append unstructured data to bigquery table
问题描述
背景
我想通过 python 客户端 API 将换行符格式的 JSON 插入/附加到 bigquery
表中.
I want to insert/append newline formatted JSON into bigquery
table through python client API.
例如:
{"name":"xyz",mobile:xxx,location:"abc"}
{"name":"xyz",mobile:xxx,age:22}
问题是,一行中的所有字段都是可选的,并且没有固定的数据定义模式.
Issue is, all fields in a row are optional and there is no fixed defined schema for the data.
查询
我了解到我们可以使用支持自动模式检测的联合表.
I have read that we can use Federated tables, which supports autoschema detection.
但是,我正在寻找一种功能,它可以自动从数据中检测架构,相应地创建表,甚至在数据中出现任何额外的列/键时调整表架构,而不是创建新表.
However, I am looking for a feature, that would automatically detect schema from data,create tables accordingly and even adjust the table schema if any extra columns/keys appear in data instead of creating new table.
这是否可以使用 python 客户端 API.
Would this be possible using python client API.
推荐答案
您可以将自动检测与 BigQuery 加载 API 一起使用,即您使用 bq cli 工具的示例如下所示:
You can use autodetect with BigQuery load API, i.e. your example using bq cli tool will look like following:
~$ cat /tmp/x.json
{"name":"xyz","mobile":"xxx","location":"abc"}
{"name":"xyz","mobile":"xxx","age":"22"}
~$ bq load --autodetect --source_format=NEWLINE_DELIMITED_JSON tmp.x /tmp/x.json
Upload complete.
~$ bq show tmp.x
Table tmp.x
Last modified Schema Total Rows Total Bytes Expiration
----------------- --------------------- ------------ ------------- ------------
16 Aug 08:23:35 |- age: integer 2 33
|- location: string
|- mobile: string
|- name: string
~$ bq query "select * from tmp.x"
+------+----------+--------+------+
| age | location | mobile | name |
+------+----------+--------+------+
| NULL | abc | xxx | xyz |
| 22 | NULL | xxx | xyz |
+------+----------+--------+------+
更新:如果以后您需要添加其他字段,您可以使用 schema_update_option 来允许新字段.唉,它还不能与自动检测一起使用,因此您需要向加载 API 显式提供新模式:
Update: If later you need to add additional fields, you can use schema_update_option to allow new fields. Alas it doesn't yet work with autodetect, so you need to provide new schema explicitly to the load API:
~$ cat /tmp/x1.json
{"name":"abc","mobile":"yyy","age":"25","gender":"male"}
~$ bq load --schema=name:STRING,age:INTEGER,location:STRING,mobile:STRING,gender:STRING --schema_update_option=ALLOW_FIELD_ADDITION --source_format=NEWLINE_DELIMITED_JSON tmp.x /tmp/x1.json
Upload complete.
~$ bq show tmp.x
Table tmp.x
Last modified Schema Total Rows Total Bytes Expiration
----------------- --------------------- ------------ ------------- -----------
19 Aug 10:43:09 |- name: string 3 57
|- age: integer
|- location: string
|- mobile: string
|- gender: string
~$ bq query "select * from tmp.x"
status: DONE
+------+------+----------+--------+--------+
| name | age | location | mobile | gender |
+------+------+----------+--------+--------+
| abc | 25 | NULL | yyy | male |
| xyz | NULL | abc | xxx | NULL |
| xyz | 22 | NULL | xxx | NULL |
+------+------+----------+--------+--------+
这篇关于如何将非结构化数据插入/附加到 bigquery 表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!