bigquery每日汇总 [英] bigquery aggregate for daily basis

查看:47
本文介绍了bigquery每日汇总的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在big-query(数据仓库)中有一个表:

I have a table in big-query (datawarehouse):

,我希望得到以下结果:

and I would like to have the result of:

以下是有关计算方式的说明:

Here is the explanation on how the calculation should be:

  1. 2017-10-01 = $ 100很明显,因为数据只是一个
  2. 2017-10-02 = $ 400是第一行和第三行的总和.为什么?因为第二行和第三行具有相同的发票.因此,我们仅使用最新更新.
  3. 2017-10-04 = $ 800是第1,3和4行的总和.为什么?这是因为我们每天只收一张发票.第1行(T001),第3行(T002),第4行(T003)
  4. 2017-10-05 = $ 100是行1,5和6的总和.为什么?这是因为我们每天仅收取一张发票.第1行(T001),第5行(T002),第6行(T003)

老实说,我已经完全不知道该怎么做.我已经尝试过多次按等分组,但是它们都没有按预期方式工作.这是我到目前为止所做的最新努力:

I honestly have completely lost how to do that. I have tried multiple times to group by and etc. But none of them work as expected. This is my latest effort so far for today:

SELECT 
  amount,
  updatedDateOnly,
  invNo
FROM 
(
  SELECT 
    invNo,
    UpdatedDate,
    amount,
    DATE(updatedDate) as updatedDateOnly,
    row_number() OVER (PARTITION BY  invNo ORDER BY UpdatedDate DESC) AS rownum
  FROM [project:dataset.test] 
)
WHERE
  rownum = 1

仅返回最后一个日期.现在,我不知道如何每天查询.

only returns the last date. Now, I have no idea how to query for daily basis.

感谢任何有专业知识并愿意帮助查询的人.谢谢.

Appreciate for anyone who is expert and willing to help in querying. Thank you.

更新:json中的数据,以防您想在bigquery或其他SQL服务器中尝试使用

UPDATE: Data in json, in case you want to try in bigquery or other SQL servers:

{"UpdatedDate":"2017-10-01 01:00:00","InvNo":"T001","amount":100}
{"UpdatedDate":"2017-10-02 01:00:00","InvNo":"T002","amount":200}
{"UpdatedDate":"2017-10-02 02:00:00","InvNo":"T002","amount":300}
{"UpdatedDate":"2017-10-04 01:00:00","InvNo":"T003","amount":400}
{"UpdatedDate":"2017-10-05 01:00:00","InvNo":"T002","amount":500}
{"UpdatedDate":"2017-10-05 02:00:00","InvNo":"T003","amount":500}

推荐答案

以下是BigQuery标准SQL

Below is for BigQuery Standard SQL

#standardSQL
WITH dates AS (
  SELECT DISTINCT DATE(UpdatedDate) UpdatedDay
  FROM `project.dataset.test`
),
qualified AS (
  SELECT DATE(UpdatedDate) UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY UpdatedDate DESC LIMIT 1)[SAFE_OFFSET(0)] amount
  FROM `project.dataset.test`
  GROUP BY UpdatedDay, InvNo
)
SELECT UpdatedDay, SUM(amount) amount
FROM (
  SELECT d.UpdatedDay UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY q.UpdatedDay DESC LIMIT 1)[SAFE_OFFSET(0)] amount
  FROM dates d
  JOIN qualified q
  ON q.UpdatedDay <= d.UpdatedDay
  GROUP BY UpdatedDay, InvNo
)
GROUP BY UpdatedDay
-- ORDER BY UpdatedDay

您可以使用问题中的以下虚拟数据进行测试/操作

You can test / play with this with below dummy data from your question

#standardSQL
WITH `project.dataset.test` AS (
  SELECT TIMESTAMP '2017-10-01 01:00:00' UpdatedDate, 'T001' InvNo, 100 amount UNION ALL
  SELECT TIMESTAMP '2017-10-02 01:00:00', 'T002', 200 UNION ALL
  SELECT TIMESTAMP '2017-10-02 02:00:00', 'T002', 300 UNION ALL
  SELECT TIMESTAMP '2017-10-04 01:00:00', 'T003', 400 UNION ALL
  SELECT TIMESTAMP '2017-10-05 01:00:00', 'T002', 500 UNION ALL
  SELECT TIMESTAMP '2017-10-05 02:00:00', 'T003', 500 
),
dates AS (
  SELECT DISTINCT DATE(UpdatedDate) UpdatedDay
  FROM `project.dataset.test`
),
qualified AS (
  SELECT DATE(UpdatedDate) UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY UpdatedDate DESC LIMIT 1)[SAFE_OFFSET(0)] amount
  FROM `project.dataset.test`
  GROUP BY UpdatedDay, InvNo
)
SELECT UpdatedDay, SUM(amount) amount
FROM (
  SELECT d.UpdatedDay UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY q.UpdatedDay DESC LIMIT 1)[SAFE_OFFSET(0)] amount
  FROM dates d
  JOIN qualified q
  ON q.UpdatedDay <= d.UpdatedDay
  GROUP BY UpdatedDay, InvNo
)
GROUP BY UpdatedDay
ORDER BY UpdatedDay

结果符合预期

UpdatedDay  amount   
2017-10-01   100     
2017-10-02   400     
2017-10-04   800     
2017-10-05  1100     

这篇关于bigquery每日汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆