Bigquery阵列中的非重复计数 [英] Distinct Count across Bigquery arrays

查看:94
本文介绍了Bigquery阵列中的非重复计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在行之间连接数组,然后进行不同的计数.理想情况下,这可以工作:

I want to concatenate arrays across rows and then do a distinct count. Ideally, this would work:

WITH test AS
(
  SELECT
  DATE('2018-01-01') as date,
  2 as value,
  [1,2,3] as key
  UNION ALL
  SELECT
  DATE('2018-01-02') as date,
  3 as value,
  [1,4,5] as key
)
SELECT
  SUM(value) as total_value,
  ARRAY_LENGTH(ARRAY_CONCAT_AGG(DISTINCT key)) as unique_key_count
FROM test

很遗憾,ARRAY_CONCAT_AGG函数不支持DISTINCT运算符.我可以对数组进行嵌套,但随后出现扇出,并且value列的总和是错误的:

Unfortunately, the ARRAY_CONCAT_AGG function doesn't support the DISTINCT operator. I can unnest the array but then I get a fanout and the sum of the value column is wrong:

WITH test AS
(
  SELECT
  DATE('2018-01-01') as date,
  2 as value,
  [1,2,3] as key
  UNION ALL
  SELECT
  DATE('2018-01-02') as date,
  3 as value,
  [1,4,5] as key
)

SELECT
  SUM(value) as total_value,
  COUNT(DISTINCT k) as unique_key_count

FROM test
  CROSS JOIN UNNEST(key) k

我缺少什么可以让我避免加入未嵌套的数组吗?

Is there anything I'm missing that would allow me to avoid joining in the unnested array?

推荐答案

以下是替代方法:

CREATE TEMP FUNCTION DistinctCount(arr ANY TYPE) AS (
  (SELECT COUNT(DISTINCT x) FROM UNNEST(arr) AS x)
);

WITH test AS
(
  SELECT
  DATE('2018-01-01') as date,
  2 as value,
  [1,2,3] as key
  UNION ALL
  SELECT
  DATE('2018-01-02') as date,
  3 as value,
  [1,4,5] as key
)

SELECT
  SUM(value) as total_value,
  DistinctCount(ARRAY_CONCAT_AGG(key)) as unique_key_count
FROM test

这避免了子查询或将数组与表连接在一起(避免总和中出现重复值).

This avoids having a subquery or needing to join the array with the table (causing duplicate values in the sum).

这篇关于Bigquery阵列中的非重复计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆