错误“非重复字段已被设置”。从Datastore加载到BigQuery时 [英] Error "Non-repeated field already set." when loading from Datastore into BigQuery

查看:199
本文介绍了错误“非重复字段已被设置”。从Datastore加载到BigQuery时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



p>

我们有一个数据存储表,其中包含类别类别 Category ,这是一个自定义类。当我们尝试将此表加载到BigQuery时(从数据存储备份中)出现问题。结果表应该包含(简化):

pre $ category.subfield1
,category.subfield2
,category .subfield3.subsubfield1
,category.subfield4
,category.subfield5

改为,BigQuery对类别字段造成严重破坏:

  category_1.record.subfield1 
,category_1.record.subfield2
,category_1.record.subfield3.subsubfield1
,category_1.entity.subfield1
,category_1.entity.subfield1
,category_1.entity.subfield3.subsubfield1
,category_1.entity .subfield4
,category_1.entity.subfield5
,category_1.provided

( )



在20160219之前, 的<乱码 category -field更糟糕,但有一个解决方法:通过选项明确列出所有字段,包括 category projection_fields 。现在不再可能,因为它会产生不同的错误消息: Field:category [...] Entity是意料之外的类型__record __



原工作ID:

项目ID:711939958575

无projection_fields:job_Qw6-ygtZNFJ-Y7W0uLEqdvOrO_8 < br>
with projection_fields:job_lzzXo92lud9r5kvW7Z1kuzFLxS4

解决方案

当从数据存储装载备份到BigQuery时,我们遇到了同样的问题。我们有一个'Order'实体,其中有一个嵌套实体'Customer'。自从我们在嵌套实体'Customer'的某个字段中添加索引后,我们将从BigQuery中获取非重复字段已设置错误。



原因是在嵌套实体的字段中设置索引 (例如,Customer的字段 email 中的索引)在Order实体上创建了一个名为 customer.email 的索引。在将数据加载到BigQuery中时,这会产生两个字段,名为 customer.email ,一个来自嵌套实体,另一个来自索引。

我们的解决方案是删除嵌套实体上的索引,以避免将数据存储备份加载到BigQuery时发生这些冲突。不幸的是,我们不得不删除数据库中的所有现有记录,这对我们来说不是一个大问题,但是也可以确保索引已被正确删除。


[EDIT 20160426: This bug appears to have been solved now!]

[EDIT 20160219: Updated this question again, to reflect different error messages. See also the bug report I filed.]

We have a datastore table that contains a field category, of type Category, which is a custom class. The problem arises when we try to load this table into BigQuery (from a datastore backup). The resulting table should contain (simplified):

category.subfield1
,category.subfield2
,category.subfield3.subsubfield1
,category.subfield4
,category.subfield5

Instead, BigQuery wreaks havoc on the category field:

category_1.record.subfield1
,category_1.record.subfield2
,category_1.record.subfield3.subsubfield1
,category_1.entity.subfield1
,category_1.entity.subfield1
,category_1.entity.subfield3.subsubfield1
,category_1.entity.subfield4
,category_1.entity.subfield5
,category_1.provided

(Omitting a dozen of __key__-subfields for reasons of exposition.)

Before 20160219, the garbled output of the category-field was even worse, but there was a workaround: explicitly listing all the fields, including category, through the option projection_fields. Now that is no longer possible, since it results in a different error message: Field:category [...] Entity was of unexpected kind "__record__"

Original job-ids:

project id: 711939958575
without projection_fields: job_Qw6-ygtZNFJ-Y7W0uLEqdvOrO_8
with projection_fields: job_lzzXo92lud9r5kvW7Z1kuzFLxS4

解决方案

We came accross the same problem when loading backups from datastore into BigQuery. We had an 'Order' Entity in which we had a nested entity 'Customer'. Ever since we added an index on one of the fields in the nested entity 'Customer', we would be getting the "Non-repeated field already set" error from BigQuery.

The reason was that setting an index on a field in the nested entity (e.g. Index on field email in Customer) created an index on the Order entity called customer.email. When loading data into BigQuery this results in two fields called customer.email, one from the nested Entity and one from the index.

The solution for us was to remove indices on nested Entities, in order to avoid these conflicts while loading datastore backups into BigQuery. Unfortunately we did have to remove all existing records in database, which for us wasn't a big problem, but alternatively you would have to make sure the Index is properly removed.

这篇关于错误“非重复字段已被设置”。从Datastore加载到BigQuery时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆