BigQuery:如何将字段添加到REPEATED记录中? [英] BigQuery: How do I add a field to a REPEATED record?
问题描述
我在Google BigQuery中有一个表,该表由几个字段组成,然后是一个REPEATED记录,其中可能包含一个或多个对象.我想在REPEATED数据中创建一个带有额外字段的新表,并将我的原始数据复制到新表中,并使用GENERATE_UUID()的输出填充新字段,以便每条REPEATED数据行都有一个唯一的标识符./p>
我在 并且我希望它最终像这样,模式2: 因此,我已经用该模式创建了新表(表2).我想从表1中复制内容,并且正在尝试执行以下操作: 这给了我一个错误: 有人对如何进行有任何建议吗?谢谢! 我已按照更新重复记录.从插入到更新,直到将第二条注释添加到重复记录的那一刻,我都遵循了所有示例. 然后我应用了UNNEST查询: 当然可以,但是不能提供所需的结果. 根据官方文档"BigQuery本机支持多种模式更改,例如向记录添加新的嵌套字段或放宽嵌套字段的模式."然后,也许路径是在复制表,然后再添加额外的字段. 可以在管理表架构文档之后完成.也就是说,可以使用API并调用 tables.patch ,在此处): 使用BigQuery UI中的复制"表来获取不带"id"的表的副本.我的起始表是followingDML,副本是followingDMLmod. 通过在Cloud Shell中运行以下命令将表中的架构复制到JSON文件(此处称为myschema.json)中
[{"name":"id","type":"NUMERIC","mode":"REQUIRED"}},{名称":名称",类型":"STRING",模式":必需"},{名称":创建",类型":"TIMESTAMP",模式":必需"},{名称":有效",类型":布尔",模式":必需"},{名称":参数",类型":记录",模式":重复",字段":[{名称":"parameter1",类型":"STRING",模式":必需"},{"name":"parameter2","type":"FLOAT","mode":"REQUIRED"}},{名称":"parameter3",类型":"BOOLEAN",模式":必需"}]}]
[{"name":"id","type":"NUMERIC","mode":"REQUIRED"}},{名称":名称",类型":"STRING",模式":必需"},{名称":创建",类型":"TIMESTAMP",模式":必需"},{名称":有效",类型":布尔",模式":必需"},{名称":参数",类型":记录",模式":重复",字段":[{"name":"uuid","type":"STRING","mode":"REQUIRED"}},{名称":"parameter1",类型":"STRING",模式":必需"},{"name":"parameter2","type":"FLOAT","mode":"REQUIRED"}},{名称":"parameter3",类型":"BOOLEAN",模式":必需"}]}]
插入table2_with_uuid(id,名称,创建的,有效的,参数)选择ID,名称,创建的,有效的,[(GENERATE_UUID(),parameters.parameter1,parameters.parameter2,parameters.parameter3)]来自table1_no_guid;
无法访问类型为ARRAY< STRUCT< parameter1
(等)的值的字段ceId
插入`testing.followingDMLmod`(产品,数量,供应限制,注释)选择产品,数量,受供应限制,[(GENERATE_UUID(),com.created,com.comment)]来自`testing.followingDML`,UNNEST(comments)com;
bq show \-模式--format = prettyjson \testing.followingDMLmod>myschema.json
- 在文本编辑器中打开架构.例如运行
vim myschema.json
- 现在修改架构,以将新的嵌套列添加到fields数组的末尾.(如果您从未使用过vim,则非常简单的解释是"esc"将使您返回普通模式,而在普通模式下,单击"i"允许您写入打开的文件,:w"保存文件,而":q退出文件)我加入了"id"字段:
{"mode":"NULLABLE","name":"id","type":"STRING"}
- 现在您需要通过运行 来更新架构
bq更新testing.followingDMLmod myschema.json
最后,回到BigQuery用户界面
UPDATE`testing.followingDMLmod`SET注释= ARRAY(选择AS STRUCT * REPLACE(GENERATE_UUID()AS ID)来自UNNEST(评论))在哪里真实
创建后填充ID字段.遵循此堆栈溢出帖子中的建议.现在最终结果确实是预期的结果!
I've got a table in Google BigQuery that consists of a few fields, then a REPEATED record which may contain one or more objects. I want to create a new table with an extra field in the REPEATED data, and copy my original data into the new table, populating the new field with the output of GENERATE_UUID() so there is one unique identifier per REPEATED line of data.
I had a similar question at How do I copy from one BigQuery Table to another when the target contains REPEATED fields? but I don't know how to adapt this to fit my current use case.
Here's my "new" Schema 1 (ie Schema 2 from the above link)
[
{"name": "id", "type": "NUMERIC", "mode": "REQUIRED"},
{"name": "name", "type": "STRING", "mode": "REQUIRED"},
{"name": "created", "type": "TIMESTAMP", "mode": "REQUIRED"},
{"name": "valid", "type": "BOOLEAN", "mode": "REQUIRED"},
{"name": "parameters", "type": "RECORD", "mode": "REPEATED", "fields":
[
{"name": "parameter1", "type": "STRING", "mode": "REQUIRED"},
{"name": "parameter2", "type": "FLOAT", "mode": "REQUIRED"},
{"name": "parameter3", "type": "BOOLEAN", "mode": "REQUIRED"}
]
}
]
and I'd like it to end up like this, Schema 2:
[
{"name": "id", "type": "NUMERIC", "mode": "REQUIRED"},
{"name": "name", "type": "STRING", "mode": "REQUIRED"},
{"name": "created", "type": "TIMESTAMP", "mode": "REQUIRED"},
{"name": "valid", "type": "BOOLEAN", "mode": "REQUIRED"},
{"name": "parameters", "type": "RECORD", "mode": "REPEATED", "fields":
[
{"name": "uuid", "type": "STRING", "mode": "REQUIRED"},
{"name": "parameter1", "type": "STRING", "mode": "REQUIRED"},
{"name": "parameter2", "type": "FLOAT", "mode": "REQUIRED"},
{"name": "parameter3", "type": "BOOLEAN", "mode": "REQUIRED"}
]
}
]
So I've got my new table (Table 2) created with this Schema. I want to copy from Table 1, and I'm trying something like this:
insert into table2_with_uuid(id, name, created, valid, parameters)
select id, name, created, valid,
[(
GENERATE_UUID(), parameters.parameter1, parameters.parameter2, parameters.parameter3
)]
from table1_no_guid;
This gives me an error saying:
Cannot access field ceId on a value with type ARRAY<STRUCT<parameter1
(etc)
Does anyone have any suggestions as to how to proceed? Thanks!
I have followed the procedure in Data Manipulation Language syntax on the official documentation.
Then, basically what you want is to update repeated records. I have followed all the examples, from the inserts to the updates up to the moment in which a second comment is added to the repeated record.
Then I applied the UNNEST query:
insert into `testing.followingDMLmod` (product, quantity, supply_constrained, comments)
select product, quantity, supply_constrained,
[(
GENERATE_UUID(), com.created, com.comment
)]
from `testing.followingDML` , UNNEST(comments) com;
which of course works but does not provide the desired result.
As per the official documentation "BigQuery natively supports several schema changes such as adding a new nested field to a record or relaxing a nested field's mode." Then, perhaps the path is copying the table and afterwards adding the extra field.
That can be done following managing table schemas documentation. That is, either using the API and calling tables.patch, which was discussed with more detail in this other stack overflow post, or using a JSON file with the schema from the command line.
I have personally followed the second approach (JSON schema file) and worked perfectly for me. With more detail the steps I followed are (as found in here):
Use Copy table in the BigQuery UI to get a replica of your table without "id". My starting table is followingDML and the copy followingDMLmod.
Copy the schema from your table into a JSON file (here called myschema.json) by running the following command in the Cloud Shell
bq show \
--schema \
--format=prettyjson \
testing.followingDMLmod > myschema.json
- Open the schema in a text editor. For example running
vim myschema.json
- Now modify the schema to add the new nested column to the end of the fields array. (If you have never used vim, a very simplified explanation would be "esc" returns you to normal mode, and while in normal mode clicking "i" allows you to write into the opened file, ":w" saves the file and ":q" exits the file) I included the field "id":
{
"mode": "NULLABLE",
"name": "id",
"type": "STRING"
}
- Now you need to update the schema by running
bq update testing.followingDMLmod myschema.json
Finally, back on the BigQuery UI I used the query
UPDATE `testing.followingDMLmod`
SET comments = ARRAY(
SELECT AS STRUCT * REPLACE(GENERATE_UUID() AS id)
FROM UNNEST(comments)
)
WHERE true
to populate the id field after creation. Following what is suggested in this stack overflow post. Now the end result is truly what it was expected!
这篇关于BigQuery:如何将字段添加到REPEATED记录中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!