BigQuery中按键(或对称聚合)函数区分的总和 [英] Sum Distinct by key (or Symmetric Aggregates) function in BigQuery
本文介绍了BigQuery中按键(或对称聚合)函数区分的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试通过对键值进行重复数据消除来汇总扇形/重复值.Looker将此称为对称聚合.我想使用一个持久的UDF,而不是依靠子查询.Looker有一个非常优雅的解决方案,似乎可以将其烘焙到UDF中.
I'm trying to sum fanned/duplicated values by de-duping on their key. Looker calls this Symmetric Aggregates. I'd like to use a persistent UDF and not lean on subqueries. Looker has a pretty elegant solution that seems like it could be baked into a UDF.
我尝试过:
CREATE OR REPLACE FUNCTION `fn.sumdistinct`(unique_key ANY TYPE, val_to_sum ANY TYPE) AS (
COALESCE(ROUND(COALESCE(CAST((SUM(DISTINCT (CAST(ROUND(COALESCE(safe_cast(val_to_sum as float64) ,0)*(1/1000*1.0), 9) AS NUMERIC) + (cast(cast(concat('0x', substr(to_hex(md5(CAST(unique_key AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(unique_key AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001 )) - SUM(DISTINCT (cast(cast(concat('0x', substr(to_hex(md5(CAST(unique_key AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(unique_key AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001) ) / (1/1000*1.0) AS FLOAT64), 0), 6), 0)
);
但是我得到了
无效的函数fn.sumdistinct.模板化SQL函数调用中不允许聚合函数SUM
我正在寻找可以解决这个问题的功能
I'm looking for a function that can turn this:
id val
1 100
2 200
2 200
3 300
3 300
3 300
进入:
unique_ids total_value
3 600
推荐答案
以下是BigQuery标准SQL
Below is for BigQuery Standard SQL
#standardSQL
CREATE TEMP FUNCTION SumDistinct(arr ANY TYPE) AS ((
SELECT AS STRUCT
COUNT(DISTINCT id) unique_ids,
SUM(val) total_value
FROM (
SELECT ANY_VALUE(t).*
FROM UNNEST(arr) t
GROUP BY FORMAT('%t', t)
)
));
SELECT SumDistinct(ARRAY_AGG(STRUCT(id, val))).*
FROM `project.dataset.data`
如果要应用于您的问题的样本数据-结果为
If to apply to sample data from your question - result is
Row unique_ids total_value
1 3 600
这篇关于BigQuery中按键(或对称聚合)函数区分的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文