BigQuery:如何将字段添加到REPEATED记录中? [英] BigQuery: How do I add a field to a REPEATED record?

查看:35
本文介绍了BigQuery:如何将字段添加到REPEATED记录中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Google BigQuery中有一个表,该表由几个字段组成,然后是一个REPEATED记录,其中可能包含一个或多个对象.我想在REPEATED数据中创建一个带有额外字段的新表,并将我的原始数据复制到新表中,并使用GENERATE_UUID()的输出填充新字段,以便每条REPEATED数据行都有一个唯一的标识符./p>

我在

  [{"name":"id","type":"NUMERIC","mode":"REQUIRED"}},{名称":名称",类型":"STRING",模式":必需"},{名称":创建",类型":"TIMESTAMP",模式":必需"},{名称":有效",类型":布尔",模式":必需"},{名称":参数",类型":记录",模式":重复",字段":[{名称":"parameter1",类型":"STRING",模式":必需"},{"name":"parameter2","type":"FLOAT","mode":"REQUIRED"}},{名称":"parameter3",类型":"BOOLEAN",模式":必需"}]}] 

并且我希望它最终像这样,模式2:

  [{"name":"id","type":"NUMERIC","mode":"REQUIRED"}},{名称":名称",类型":"STRING",模式":必需"},{名称":创建",类型":"TIMESTAMP",模式":必需"},{名称":有效",类型":布尔",模式":必需"},{名称":参数",类型":记录",模式":重复",字段":[{"name":"uuid","type":"STRING","mode":"REQUIRED"}},{名称":"parameter1",类型":"STRING",模式":必需"},{"name":"parameter2","type":"FLOAT","mode":"REQUIRED"}},{名称":"parameter3",类型":"BOOLEAN",模式":必需"}]}] 

因此,我已经用该模式创建了新表(表2).我想从表1中复制内容,并且正在尝试执行以下操作:

 插入table2_with_uuid(id,名称,创建的,有效的,参数)选择ID,名称,创建的,有效的,[(GENERATE_UUID(),parameters.parameter1,parameters.parameter2,parameters.parameter3)]来自table1_no_guid; 

这给了我一个错误:无法访问类型为ARRAY< STRUCT< parameter1 (等)的值的字段ceId

有人对如何进行有任何建议吗?谢谢!

我已按照更新重复记录.从插入到更新,直到将第二条注释添加到重复记录的那一刻,我都遵循了所有示例.

然后我应用了UNNEST查询:

 插入`testing.followingDMLmod`(产品,数量,供应限制,注释)选择产品,数量,受供应限制,[(GENERATE_UUID(),com.created,com.comment)]来自`testing.followingDML`,UNNEST(comments)com; 

当然可以,但是不能提供所需的结果.

根据官方文档"BigQuery本机支持多种模式更改,例如向记录添加新的嵌套字段或放宽嵌套字段的模式."然后,也许路径是在复制表,然后再添加额外的字段.

可以在管理表架构文档之后完成.也就是说,可以使用API​​并调用 tables.patch ,在此处):

  bq show \-模式--format = prettyjson \testing.followingDMLmod>myschema.json 

  vim myschema.json 

  {"mode":"NULLABLE","name":"id","type":"STRING"} 

  bq更新testing.followingDMLmod myschema.json 

最后,回到BigQuery用户界面

  UPDATE`testing.followingDMLmod`SET注释= ARRAY(选择AS STRUCT * REPLACE(GENERATE_UUID()AS ID)来自UNNEST(评论))在哪里真实 

创建后填充ID字段.遵循堆栈溢出帖子中的建议.现在最终结果确实是预期的结果!

I've got a table in Google BigQuery that consists of a few fields, then a REPEATED record which may contain one or more objects. I want to create a new table with an extra field in the REPEATED data, and copy my original data into the new table, populating the new field with the output of GENERATE_UUID() so there is one unique identifier per REPEATED line of data.

I had a similar question at How do I copy from one BigQuery Table to another when the target contains REPEATED fields? but I don't know how to adapt this to fit my current use case.

Here's my "new" Schema 1 (ie Schema 2 from the above link)

[
    {"name": "id", "type": "NUMERIC", "mode": "REQUIRED"},
    {"name": "name", "type": "STRING", "mode": "REQUIRED"},
    {"name": "created", "type": "TIMESTAMP", "mode": "REQUIRED"},
    {"name": "valid", "type": "BOOLEAN", "mode": "REQUIRED"},
    {"name": "parameters", "type": "RECORD", "mode": "REPEATED", "fields":
        [
            {"name": "parameter1", "type": "STRING", "mode": "REQUIRED"},
            {"name": "parameter2", "type": "FLOAT", "mode": "REQUIRED"},
            {"name": "parameter3", "type": "BOOLEAN", "mode": "REQUIRED"}
        ]
    }
]

and I'd like it to end up like this, Schema 2:

[
    {"name": "id", "type": "NUMERIC", "mode": "REQUIRED"},
    {"name": "name", "type": "STRING", "mode": "REQUIRED"},
    {"name": "created", "type": "TIMESTAMP", "mode": "REQUIRED"},
    {"name": "valid", "type": "BOOLEAN", "mode": "REQUIRED"},
    {"name": "parameters", "type": "RECORD", "mode": "REPEATED", "fields":
        [
            {"name": "uuid", "type": "STRING", "mode": "REQUIRED"},
            {"name": "parameter1", "type": "STRING", "mode": "REQUIRED"},
            {"name": "parameter2", "type": "FLOAT", "mode": "REQUIRED"},
            {"name": "parameter3", "type": "BOOLEAN", "mode": "REQUIRED"}
        ]
    }
]

So I've got my new table (Table 2) created with this Schema. I want to copy from Table 1, and I'm trying something like this:

insert into table2_with_uuid(id, name, created, valid, parameters)
select id, name, created, valid,
[(
GENERATE_UUID(), parameters.parameter1, parameters.parameter2, parameters.parameter3
)]
from table1_no_guid;

This gives me an error saying: Cannot access field ceId on a value with type ARRAY<STRUCT<parameter1 (etc)

Does anyone have any suggestions as to how to proceed? Thanks!

解决方案

I have followed the procedure in Data Manipulation Language syntax on the official documentation.

Then, basically what you want is to update repeated records. I have followed all the examples, from the inserts to the updates up to the moment in which a second comment is added to the repeated record.

Then I applied the UNNEST query:

insert into `testing.followingDMLmod`  (product, quantity, supply_constrained, comments)
select product, quantity, supply_constrained,
[(
GENERATE_UUID(), com.created, com.comment
)]
from `testing.followingDML` , UNNEST(comments) com;

which of course works but does not provide the desired result.

As per the official documentation "BigQuery natively supports several schema changes such as adding a new nested field to a record or relaxing a nested field's mode." Then, perhaps the path is copying the table and afterwards adding the extra field.

That can be done following managing table schemas documentation. That is, either using the API and calling tables.patch, which was discussed with more detail in this other stack overflow post, or using a JSON file with the schema from the command line.

I have personally followed the second approach (JSON schema file) and worked perfectly for me. With more detail the steps I followed are (as found in here):

  • Use Copy table in the BigQuery UI to get a replica of your table without "id". My starting table is followingDML and the copy followingDMLmod.

  • Copy the schema from your table into a JSON file (here called myschema.json) by running the following command in the Cloud Shell

bq show \
--schema \
--format=prettyjson \
testing.followingDMLmod > myschema.json

  • Open the schema in a text editor. For example running

vim myschema.json

  • Now modify the schema to add the new nested column to the end of the fields array. (If you have never used vim, a very simplified explanation would be "esc" returns you to normal mode, and while in normal mode clicking "i" allows you to write into the opened file, ":w" saves the file and ":q" exits the file) I included the field "id":

{
  "mode": "NULLABLE",
  "name": "id",
  "type": "STRING"
}

  • Now you need to update the schema by running

bq update testing.followingDMLmod myschema.json

Finally, back on the BigQuery UI I used the query

UPDATE `testing.followingDMLmod` 
SET comments = ARRAY(
    SELECT AS STRUCT * REPLACE(GENERATE_UUID() AS id)
    FROM UNNEST(comments)
  )
WHERE true

to populate the id field after creation. Following what is suggested in this stack overflow post. Now the end result is truly what it was expected!

这篇关于BigQuery:如何将字段添加到REPEATED记录中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆