在滚动时间范围内计算唯一 ID [英] Count unique ids in a rolling time frame

查看:29
本文介绍了在滚动时间范围内计算唯一 ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的表格,上面有很多 ID 和日期.

I have a simple table as bellow with lots of IDs and dates.

ID      Date
10R46   2014-11-23  
10R46   2016-04-11  
100R9   2016-12-21
10R91   2013-05-03 
...     ...

我想制定一个查询来计算日期滚动时间范围内的唯一 ID,例如十天.这意味着对于每个日期,它应该给我该日期和 10 天前之间唯一 ID 的数量.结果应该是这样的.

I want to formulate a query which counts the unique IDs for a rolling time frame of dates, for example ten days. Meaning that for each date it should give me the number of unique IDs between that date and 10 days back. Result should look something like this.

UniqueTenDays    Date
200              2014-11-23 
324              2014-11-24 
522              2014-11-25
532              2014-11-26 
...              ...

类似于下面的内容,但我意识到我需要应用 WHERE 子句并以某种方式计算每个日期的 ID.

Something along the lines of the bellow but I realise I need to apply the WHERE clause and count the IDs for each Date somehow.

SELECT Date, COUNT(DISTINCT ID)
FROM T 
WHERE Date BETWEEN DATE_SUB(Date, INTERVAL 10 DAY) AND Date
GROUP BY Date
ORDER BY Date

提前致谢.

推荐答案

以下为 BigQuery Standard SQL

Below is for BigQuery Standard SQL

#standardSQL
WITH temp1 AS (
  SELECT dt, STRING_AGG(DISTINCT id) AS users
  FROM `project.dataset.yourtable`
  GROUP BY dt
), temp2 AS (
  SELECT
    dt, 
    STRING_AGG(users) OVER(ORDER BY UNIX_DATE(dt) RANGE BETWEEN 10 PRECEDING AND CURRENT ROW) users
  FROM temp1
)
SELECT dt, 
  (SELECT COUNT(DISTINCT id) FROM UNNEST(SPLIT(users)) AS id) UniqueTenDays
FROM temp2

您可以使用以下虚拟数据测试/玩它

you can test / play with it using dummy data as below

#standardSQL
WITH `project.dataset.yourtable` AS (
  SELECT '10R46' id,  DATE '2014-11-23' dt UNION ALL  
  SELECT '10R46',     DATE '2016-04-11' UNION ALL  
  SELECT '10R46',     DATE '2016-04-12' UNION ALL  
  SELECT '10R47',     DATE '2016-04-13' UNION ALL  
  SELECT '10R48',     DATE '2016-04-14' UNION ALL  
  SELECT '100R9',     DATE '2016-12-21' UNION ALL
  SELECT '10R91',     DATE '2013-05-03'
), temp1 AS (
  SELECT dt, STRING_AGG(DISTINCT id) AS users
  FROM `project.dataset.yourtable`
  GROUP BY dt
), temp2 AS (
  SELECT
    dt, 
    STRING_AGG(users) OVER(ORDER BY UNIX_DATE(dt) RANGE BETWEEN 10 PRECEDING AND CURRENT ROW) users
  FROM temp1
)
SELECT dt,  
  (SELECT COUNT(DISTINCT id) FROM UNNEST(SPLIT(users)) AS id) UniqueTenDays
FROM temp2

这篇关于在滚动时间范围内计算唯一 ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆