Bigquery阵列中的非重复计数 [英] Distinct Count across Bigquery arrays
本文介绍了Bigquery阵列中的非重复计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想在行之间连接数组,然后进行不同的计数.理想情况下,这可以工作:
I want to concatenate arrays across rows and then do a distinct count. Ideally, this would work:
WITH test AS
(
SELECT
DATE('2018-01-01') as date,
2 as value,
[1,2,3] as key
UNION ALL
SELECT
DATE('2018-01-02') as date,
3 as value,
[1,4,5] as key
)
SELECT
SUM(value) as total_value,
ARRAY_LENGTH(ARRAY_CONCAT_AGG(DISTINCT key)) as unique_key_count
FROM test
很遗憾,ARRAY_CONCAT_AGG
函数不支持DISTINCT
运算符.我可以对数组进行嵌套,但随后出现扇出,并且value列的总和是错误的:
Unfortunately, the ARRAY_CONCAT_AGG
function doesn't support the DISTINCT
operator. I can unnest the array but then I get a fanout and the sum of the value column is wrong:
WITH test AS
(
SELECT
DATE('2018-01-01') as date,
2 as value,
[1,2,3] as key
UNION ALL
SELECT
DATE('2018-01-02') as date,
3 as value,
[1,4,5] as key
)
SELECT
SUM(value) as total_value,
COUNT(DISTINCT k) as unique_key_count
FROM test
CROSS JOIN UNNEST(key) k
我缺少什么可以让我避免加入未嵌套的数组吗?
Is there anything I'm missing that would allow me to avoid joining in the unnested array?
推荐答案
以下是替代方法:
CREATE TEMP FUNCTION DistinctCount(arr ANY TYPE) AS (
(SELECT COUNT(DISTINCT x) FROM UNNEST(arr) AS x)
);
WITH test AS
(
SELECT
DATE('2018-01-01') as date,
2 as value,
[1,2,3] as key
UNION ALL
SELECT
DATE('2018-01-02') as date,
3 as value,
[1,4,5] as key
)
SELECT
SUM(value) as total_value,
DistinctCount(ARRAY_CONCAT_AGG(key)) as unique_key_count
FROM test
这避免了子查询或将数组与表连接在一起(避免总和中出现重复值).
This avoids having a subquery or needing to join the array with the table (causing duplicate values in the sum).
这篇关于Bigquery阵列中的非重复计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文