BigQuery中按键(或对称聚合)函数区分的总和 [英] Sum Distinct by key (or Symmetric Aggregates) function in BigQuery

查看:46
本文介绍了BigQuery中按键(或对称聚合)函数区分的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过对键值进行重复数据消除来汇总扇形/重复值.Looker将此称为对称聚合.我想使用一个持久的UDF,而不是依靠子查询.Looker有一个非常优雅的解决方案,似乎可以将其烘焙到UDF中.

I'm trying to sum fanned/duplicated values by de-duping on their key. Looker calls this Symmetric Aggregates. I'd like to use a persistent UDF and not lean on subqueries. Looker has a pretty elegant solution that seems like it could be baked into a UDF.

我尝试过:

CREATE OR REPLACE FUNCTION `fn.sumdistinct`(unique_key ANY TYPE, val_to_sum ANY TYPE) AS (
 COALESCE(ROUND(COALESCE(CAST((SUM(DISTINCT (CAST(ROUND(COALESCE(safe_cast(val_to_sum as float64) ,0)*(1/1000*1.0), 9) AS NUMERIC) + (cast(cast(concat('0x', substr(to_hex(md5(CAST(unique_key  AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(unique_key  AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001 )) - SUM(DISTINCT (cast(cast(concat('0x', substr(to_hex(md5(CAST(unique_key  AS STRING))), 1, 15)) as int64) as numeric) * 4294967296 + cast(cast(concat('0x', substr(to_hex(md5(CAST(unique_key  AS STRING))), 16, 8)) as int64) as numeric)) * 0.000000001) )  / (1/1000*1.0) AS FLOAT64), 0), 6), 0)
);

但是我得到了

无效的函数fn.sumdistinct.模板化SQL函数调用中不允许聚合函数SUM

我正在寻找可以解决这个问题的功能

I'm looking for a function that can turn this:

id   val
1    100
2    200
2    200
3    300
3    300
3    300

进入:

unique_ids  total_value
3           600

推荐答案

以下是BigQuery标准SQL

Below is for BigQuery Standard SQL

#standardSQL
CREATE TEMP FUNCTION SumDistinct(arr ANY TYPE) AS ((
  SELECT AS STRUCT 
    COUNT(DISTINCT id) unique_ids, 
    SUM(val) total_value
  FROM (
    SELECT ANY_VALUE(t).*
    FROM UNNEST(arr) t
    GROUP BY FORMAT('%t', t)
  )
));
SELECT SumDistinct(ARRAY_AGG(STRUCT(id, val))).*
FROM `project.dataset.data`   

如果要应用于您的问题的样本数据-结果为

If to apply to sample data from your question - result is

Row unique_ids  total_value  
1   3           600 

这篇关于BigQuery中按键(或对称聚合)函数区分的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆