BigQuery 标准 SQL:如何按 ARRAY 字段分组 [英] BigQuery standard SQL: how to group by an ARRAY field
问题描述
我的表有两列,id
和 a
.id
列包含一个数字,a
列包含一个字符串数组.我想计算给定数组的唯一 id 数,数组之间的相等性被定义为每个索引的大小相同,字符串相同".
My table has two columns, id
and a
. Column id
contains a number, column a
contains an array of strings. I want to count the number of unique id for a given array, equality between arrays being defined as "same size, same string for each index".
当使用 GROUP BY a
时,我得到 不允许按 ARRAY 类型的表达式分组
.我可以使用类似 GROUP BY ARRAY_TO_STRING(a, ",")
的东西,但是两个数组 ["a,b"]
和 ["a","b"]
组合在一起,我失去了数组的真实"值(所以如果我想稍后在另一个查询中使用它,我必须拆分字符串).
When using GROUP BY a
, I get Grouping by expressions of type ARRAY is not allowed
. I can use something like GROUP BY ARRAY_TO_STRING(a, ",")
, but then the two arrays ["a,b"]
and ["a","b"]
are grouped together, and I lose the "real" value of my array (so if I want to use it later in another query, I have to split the string).
这个字段数组中的值来自用户,所以我不能假设某些字符根本不会存在(并将其用作分隔符).
The values in this field array come from the user, so I can't assume that some character is simply never going to be there (and use it as a separator).
推荐答案
代替 GROUP BY ARRAY_TO_STRING(a, ",")
使用 GROUP BY TO_JSON_STRING(a)
因此您的查询将如下所示
so your query will look like below
#standardsql
SELECT
TO_JSON_STRING(a) arr,
COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY arr
您可以使用如下虚拟数据进行测试
You can test it with dummy data like below
#standardsql
WITH `project.dataset.table` AS (
SELECT 1 id, ["a,b", "c"] a UNION ALL
SELECT 1, ["a","b,c"]
)
SELECT
TO_JSON_STRING(a) arr,
COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY arr
结果为
Row arr cnt
1 ["a,b","c"] 1
2 ["a","b,c"] 1
根据@Ted 的评论更新
Update based on @Ted's comment
#standardsql
SELECT
ANY_VALUE(a) a,
COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY TO_JSON_STRING(a)
这篇关于BigQuery 标准 SQL:如何按 ARRAY 字段分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!