当使用GROUP BY选择到表中时,BigQuery即使使用“noflatten_results”国旗上 [英] BigQuery flattens result when selecting into table with GROUP BY even with "noflatten_results" flag on

查看:245
本文介绍了当使用GROUP BY选择到表中时,BigQuery即使使用“noflatten_results”国旗上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张有重复记录的表格。我想删除它们。我创建了一个名为hash_code的列,它只是所有列的sha1哈希值。重复的行将具有相同的哈希码。除了当我尝试使用包含GROUP BY的查询创建新表时,一切都很好。我的表具有RECORD数据类型,但即使指定了不平坦的表格,也会创建新表格。看起来像GROUP BY,并且-noflatten_results标志不会很好。



以下是我运行的命令行示例:

  bq query --allow_large_results --destination_table mydataset.my_events --noflatten_results --replace 
select hash_code,min(event)as event,min(properties。 adgroup_name)作为properties.adgroup_name,$ b $ min(properties.adid)作为properties.adid,min(properties.app_id)作为properties.app_id,$ b $ min(properties.campaign_name)作为properties.campaign_name从mydataset。 my_orig_events通过hash_code将每个
组合在一起

在上面的例子中,properties是一个RECORD数据类型嵌套字段。结果表不具有RECORD数据类型的属性。相反,它将properties.adgroup_name转换为properties_adgroup_name等。



强制BigQuery将结果集视为RECORD而不是在GROUP BY中展开的任何方式? b
$ b

谢谢!

解决方案

尽管有一些已知的情况,尽管查询结果可以平滑请求unflattened results。


  1. 包含 GROUP BY 子句的查询

  2. 包含 ORDER BY 子句的查询
  3. 选择具有扁平别名的嵌套字段(例如 SELECT record.record.field AS flat_field )。请注意,这只会使应用了别名的特定字段变平坦,并且只有在该字段和其父记录不重复时才会变平。



<在这些情况下,BigQuery查询引擎总是会使查询结果变平。据我所知,这种行为没有解决方法,除了从查询中删除这些子句或别名。


I have a table with duplicate records. I want to remove them. I've created a column called "hash_code" which is just a sha1 hash of all the columns. Duplicate rows will have the same hash code. Everything is fine except when I tried to create a new table with a query containing GROUP BY. My table has RECORD data type, but the new table created flattens it even when I had specified it to not flatten. Seems like GROUP BY and the "-noflatten_results" flag doesn't place nice.

Here's an example command line I ran:

bq query --allow_large_results --destination_table mydataset.my_events --noflatten_results --replace 
"select hash_code, min(event) as event, min(properties.adgroup_name) as properties.adgroup_name, 
min(properties.adid) as properties.adid, min(properties.app_id) as properties.app_id,  
min(properties.campaign_name) as properties.campaign_name from mydataset.my_orig_events group each 
by hash_code "

In the above example, properties is a RECORD data type with nested fields. The resulting table doesn't have properties as RECORD data type. Instead it translated properties.adgroup_name to properties_adgroup_name, etc.

Any way to force BigQuery to treat the result set as RECORD and not flatten in GROUP BY?

Thanks!

解决方案

There are a few known cases where query results can be flattened despite requesting unflattened results.

  1. Queries containing a GROUP BY clause
  2. Queries containing an ORDER BY clause
  3. Selecting a nested field with a flat alias (e.g. SELECT record.record.field AS flat_field). Note that this only flattens the specific field with the alias applied, and only flattens the field if it and its parent records are non-repeated.

The BigQuery query engine always flattens query results in these cases. As far as I know, there is no workaround for this behavior, other than removing these clauses or aliases from the query.

这篇关于当使用GROUP BY选择到表中时,BigQuery即使使用“noflatten_results”国旗上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆