如何将非结构化数据插入/附加到 bigquery 表 [英] How to insert/append unstructured data to bigquery table

查看:30
本文介绍了如何将非结构化数据插入/附加到 bigquery 表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

我想通过 python 客户端 API 将换行符格式的 JSON 插入/附加到 bigquery 表中.

I want to insert/append newline formatted JSON into bigquery table through python client API.

例如:

{"name":"xyz",mobile:xxx,location:"abc"}
{"name":"xyz",mobile:xxx,age:22}

问题是,一行中的所有字段都是可选的,并且没有固定的数据定义模式.

Issue is, all fields in a row are optional and there is no fixed defined schema for the data.

查询

我了解到我们可以使用支持自动模式检测的联合表.

I have read that we can use Federated tables, which supports autoschema detection.

但是,我正在寻找一种功能,它可以自动从数据中检测架构,相应地创建表,甚至在数据中出现任何额外的列/键时调整表架构,而不是创建新表.

However, I am looking for a feature, that would automatically detect schema from data,create tables accordingly and even adjust the table schema if any extra columns/keys appear in data instead of creating new table.

这是否可以使用 python 客户端 API.

Would this be possible using python client API.

推荐答案

您可以将自动检测与 BigQuery 加载 API 一起使用,即您使用 bq cli 工具的示例如下所示:

You can use autodetect with BigQuery load API, i.e. your example using bq cli tool will look like following:

~$ cat /tmp/x.json
{"name":"xyz","mobile":"xxx","location":"abc"}
{"name":"xyz","mobile":"xxx","age":"22"}

~$ bq load --autodetect --source_format=NEWLINE_DELIMITED_JSON tmp.x /tmp/x.json
Upload complete.

~$ bq show tmp.x
Table tmp.x

   Last modified          Schema          Total Rows   Total Bytes   Expiration  
 ----------------- --------------------- ------------ ------------- ------------ 
  16 Aug 08:23:35   |- age: integer       2            33                        
                    |- location: string                                          
                    |- mobile: string                                            
                    |- name: string                                              


~$ bq query "select * from tmp.x"

+------+----------+--------+------+
| age  | location | mobile | name |
+------+----------+--------+------+
| NULL | abc      | xxx    | xyz  |
|   22 | NULL     | xxx    | xyz  |
+------+----------+--------+------+

更新:如果以后您需要添加其他字段,您可以使用 schema_update_option 来允许新字段.唉,它还不能与自动检测一起使用,因此您需要向加载 API 显式提供新模式:

Update: If later you need to add additional fields, you can use schema_update_option to allow new fields. Alas it doesn't yet work with autodetect, so you need to provide new schema explicitly to the load API:

~$ cat /tmp/x1.json 
{"name":"abc","mobile":"yyy","age":"25","gender":"male"}

~$ bq load --schema=name:STRING,age:INTEGER,location:STRING,mobile:STRING,gender:STRING --schema_update_option=ALLOW_FIELD_ADDITION --source_format=NEWLINE_DELIMITED_JSON tmp.x /tmp/x1.json
Upload complete.

~$ bq show tmp.x
Table tmp.x

   Last modified          Schema          Total Rows   Total Bytes   Expiration  
 ----------------- --------------------- ------------ ------------- -----------
  19 Aug 10:43:09   |- name: string       3            57                        
                    |- age: integer                                              
                    |- location: string                                          
                    |- mobile: string                                            
                    |- gender: string                                            


~$ bq query "select * from tmp.x"
status: DONE   
+------+------+----------+--------+--------+
| name | age  | location | mobile | gender |
+------+------+----------+--------+--------+
| abc  |   25 | NULL     | yyy    | male   |
| xyz  | NULL | abc      | xxx    | NULL   |
| xyz  |   22 | NULL     | xxx    | NULL   |
+------+------+----------+--------+--------+

这篇关于如何将非结构化数据插入/附加到 bigquery 表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆