BigQuery流插入使用具有空字段的数据流 [英] Bigquery streaming inserts using dataflow with null fields

查看:40
本文介绍了BigQuery流插入使用具有空字段的数据流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用预定义的Dataflow作业模板将Bigquery流插入与Dataflow一起使用.

We are using Bigquery streaming inserts with Dataflow using the predefined Dataflow job template.

将其与可为空且重复的字段一起使用时,会遇到一些特殊之处.

I ran into some peculiarities when using this with nullable and repeated fields.

例如,使用模式

name   STRING, NULLABLE

尝试插入{name: null}

失败,并显示错误:

generic::invalid_argument: This field is not a record.","location":"name","message":"This field is not a record."

这没什么大不了的,因为它很容易简单地删除空字段,对于空数组也是如此.

This is not such a big deal since it's easy enough to simply drop null fields, and similarly for empty arrays.

但是,现在,如果我们的模式是:

However, now if our schema is:

name   STRING, REPEATED

,而我们想插入["a", "b", null, "c"],则会得到类似的错误,它引用了第三个元素.

and we want to insert ["a", "b", null, "c"] we get a similar error referencing the third element.

推荐答案

要为NULLABLE字段提供具有空值的行,只需从您要插入的行中省略该字段即可.对于第二个示例,REPEATED字段(或SQL术语为ARRAY)不能具有null元素.要为NULLABLE STRING数组建模,可以使用REPEATED RECORD,该记录包含一个名为value的STRING字段,或者在SQL术语中等效为ARRAY<STRUCT<value STRING>>.

To provide a row with a null value for a NULLABLE field, simply omit the field from the row that you are inserting. For your second example, a REPEATED field (or an ARRAY in SQL terms) cannot have a null element. To model an array of NULLABLE STRING, you can use a REPEATED RECORD that contains a STRING field named value, for instance, or equivalently an ARRAY<STRUCT<value STRING>> in SQL terms.

这篇关于BigQuery流插入使用具有空字段的数据流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆