BigQuery:如何将字段添加到REPEATED记录中? [英] BigQuery: How do I add a field to a REPEATED record?

查看：35 发布时间：2021/5/12 18:34:25 google-bigquery

本文介绍了BigQuery:如何将字段添加到REPEATED记录中?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Google BigQuery中有一个表，该表由几个字段组成，然后是一个REPEATED记录，其中可能包含一个或多个对象.我想在REPEATED数据中创建一个带有额外字段的新表，并将我的原始数据复制到新表中，并使用GENERATE_UUID()的输出填充新字段，以便每条REPEATED数据行都有一个唯一的标识符./p>

我在

  [{"name":"id"，"type":"NUMERIC"，"mode":"REQUIRED"}}，{名称":名称"，类型":"STRING"，模式":必需"}，{名称":创建"，类型":"TIMESTAMP"，模式":必需"}，{名称":有效"，类型":布尔"，模式":必需"}，{名称":参数"，类型":记录"，模式":重复"，字段":[{名称":"parameter1"，类型":"STRING"，模式":必需"}，{"name":"parameter2"，"type":"FLOAT"，"mode":"REQUIRED"}}，{名称":"parameter3"，类型":"BOOLEAN"，模式":必需"}]}]

并且我希望它最终像这样，模式2:

  [{"name":"id"，"type":"NUMERIC"，"mode":"REQUIRED"}}，{名称":名称"，类型":"STRING"，模式":必需"}，{名称":创建"，类型":"TIMESTAMP"，模式":必需"}，{名称":有效"，类型":布尔"，模式":必需"}，{名称":参数"，类型":记录"，模式":重复"，字段":[{"name":"uuid"，"type":"STRING"，"mode":"REQUIRED"}}，{名称":"parameter1"，类型":"STRING"，模式":必需"}，{"name":"parameter2"，"type":"FLOAT"，"mode":"REQUIRED"}}，{名称":"parameter3"，类型":"BOOLEAN"，模式":必需"}]}]

因此，我已经用该模式创建了新表(表2).我想从表1中复制内容，并且正在尝试执行以下操作:

 插入table2_with_uuid(id，名称，创建的，有效的，参数)选择ID，名称，创建的，有效的，[(GENERATE_UUID()，parameters.parameter1，parameters.parameter2，parameters.parameter3)]来自table1_no_guid;

这给了我一个错误:无法访问类型为ARRAY< STRUCT< parameter1 (等)的值的字段ceId

有人对如何进行有任何建议吗?谢谢！

解决方案

我已按照更新重复记录.从插入到更新，直到将第二条注释添加到重复记录的那一刻，我都遵循了所有示例.

然后我应用了UNNEST查询:

 插入`testing.followingDMLmod`(产品，数量，供应限制，注释)选择产品，数量，受供应限制，[(GENERATE_UUID()，com.created，com.comment)]来自`testing.followingDML`，UNNEST(comments)com;

当然可以，但是不能提供所需的结果.

根据官方文档"BigQuery本机支持多种模式更改，例如向记录添加新的嵌套字段或放宽嵌套字段的模式."然后，也许路径是在复制表，然后再添加额外的字段.

可以在管理表架构文档之后完成.也就是说，可以使用API并调用 tables.patch ，在此处):

  bq show \-模式--format = prettyjson \testing.followingDMLmod>myschema.json

在文本编辑器中打开架构.例如运行

  vim myschema.json

现在修改架构，以将新的嵌套列添加到fields数组的末尾.(如果您从未使用过vim，则非常简单的解释是"esc"将使您返回普通模式，而在普通模式下，单击"i"允许您写入打开的文件，:w"保存文件，而":q退出文件)我加入了"id"字段:

  {"mode":"NULLABLE"，"name":"id"，"type":"STRING"}

现在您需要通过运行

来更新架构

  bq更新testing.followingDMLmod myschema.json

最后，回到BigQuery用户界面

  UPDATE`testing.followingDMLmod`SET注释= ARRAY(选择AS STRUCT * REPLACE(GENERATE_UUID()AS ID)来自UNNEST(评论))在哪里真实

创建后填充ID字段.遵循此堆栈溢出帖子中的建议.现在最终结果确实是预期的结果！

I've got a table in Google BigQuery that consists of a few fields, then a REPEATED record which may contain one or more objects. I want to create a new table with an extra field in the REPEATED data, and copy my original data into the new table, populating the new field with the output of GENERATE_UUID() so there is one unique identifier per REPEATED line of data.

I had a similar question at How do I copy from one BigQuery Table to another when the target contains REPEATED fields? but I don't know how to adapt this to fit my current use case.

Here's my "new" Schema 1 (ie Schema 2 from the above link)

[
    {"name": "id", "type": "NUMERIC", "mode": "REQUIRED"},
    {"name": "name", "type": "STRING", "mode": "REQUIRED"},
    {"name": "created", "type": "TIMESTAMP", "mode": "REQUIRED"},
    {"name": "valid", "type": "BOOLEAN", "mode": "REQUIRED"},
    {"name": "parameters", "type": "RECORD", "mode": "REPEATED", "fields":
        [
            {"name": "parameter1", "type": "STRING", "mode": "REQUIRED"},
            {"name": "parameter2", "type": "FLOAT", "mode": "REQUIRED"},
            {"name": "parameter3", "type": "BOOLEAN", "mode": "REQUIRED"}
        ]
    }
]

and I'd like it to end up like this, Schema 2:

[
    {"name": "id", "type": "NUMERIC", "mode": "REQUIRED"},
    {"name": "name", "type": "STRING", "mode": "REQUIRED"},
    {"name": "created", "type": "TIMESTAMP", "mode": "REQUIRED"},
    {"name": "valid", "type": "BOOLEAN", "mode": "REQUIRED"},
    {"name": "parameters", "type": "RECORD", "mode": "REPEATED", "fields":
        [
            {"name": "uuid", "type": "STRING", "mode": "REQUIRED"},
            {"name": "parameter1", "type": "STRING", "mode": "REQUIRED"},
            {"name": "parameter2", "type": "FLOAT", "mode": "REQUIRED"},
            {"name": "parameter3", "type": "BOOLEAN", "mode": "REQUIRED"}
        ]
    }
]

So I've got my new table (Table 2) created with this Schema. I want to copy from Table 1, and I'm trying something like this:

insert into table2_with_uuid(id, name, created, valid, parameters)
select id, name, created, valid,
[(
GENERATE_UUID(), parameters.parameter1, parameters.parameter2, parameters.parameter3
)]
from table1_no_guid;

This gives me an error saying: Cannot access field ceId on a value with type ARRAY<STRUCT<parameter1 (etc)

Does anyone have any suggestions as to how to proceed? Thanks!

解决方案

I have followed the procedure in Data Manipulation Language syntax on the official documentation.

Then, basically what you want is to update repeated records. I have followed all the examples, from the inserts to the updates up to the moment in which a second comment is added to the repeated record.

Then I applied the UNNEST query:

insert into `testing.followingDMLmod`  (product, quantity, supply_constrained, comments)
select product, quantity, supply_constrained,
[(
GENERATE_UUID(), com.created, com.comment
)]
from `testing.followingDML` , UNNEST(comments) com;

which of course works but does not provide the desired result.

As per the official documentation "BigQuery natively supports several schema changes such as adding a new nested field to a record or relaxing a nested field's mode." Then, perhaps the path is copying the table and afterwards adding the extra field.

That can be done following managing table schemas documentation. That is, either using the API and calling tables.patch, which was discussed with more detail in this other stack overflow post, or using a JSON file with the schema from the command line.

I have personally followed the second approach (JSON schema file) and worked perfectly for me. With more detail the steps I followed are (as found in here):

Use Copy table in the BigQuery UI to get a replica of your table without "id". My starting table is followingDML and the copy followingDMLmod.
Copy the schema from your table into a JSON file (here called myschema.json) by running the following command in the Cloud Shell

bq show \
--schema \
--format=prettyjson \
testing.followingDMLmod > myschema.json

Open the schema in a text editor. For example running

vim myschema.json

Now modify the schema to add the new nested column to the end of the fields array. (If you have never used vim, a very simplified explanation would be "esc" returns you to normal mode, and while in normal mode clicking "i" allows you to write into the opened file, ":w" saves the file and ":q" exits the file) I included the field "id":

{
  "mode": "NULLABLE",
  "name": "id",
  "type": "STRING"
}

Now you need to update the schema by running

bq update testing.followingDMLmod myschema.json

Finally, back on the BigQuery UI I used the query

UPDATE `testing.followingDMLmod` 
SET comments = ARRAY(
    SELECT AS STRUCT * REPLACE(GENERATE_UUID() AS id)
    FROM UNNEST(comments)
  )
WHERE true

to populate the id field after creation. Following what is suggested in this stack overflow post. Now the end result is truly what it was expected!

这篇关于BigQuery:如何将字段添加到REPEATED记录中?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BigQuery:如何将字段添加到REPEATED记录中? [英] BigQuery: How do I add a field to a REPEATED record?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

BigQuery:如何将字段添加到REPEATED记录中? [英] BigQuery: How do I add a field to a REPEATED record?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭