累积非重复计数 [英] Cumulative distinct count
问题描述
我正在查询以获取每天累积的不同 uid 计数.
I am working on query to get cumulative distinct count of uids on daily basis.
示例:假设有 2 个 uid (100,200) 出现在日期 2016-11-01 并且它们也在第二天出现在 2016-11-02 的新 uid 300 (100,200,300)在这一点上,我希望存储累积计数为 3,而不是 5,因为(用户 ID 100 和 200 已在过去一天出现).
Example : Say there are 2 uids (100,200) appeared on date 2016-11-01 and they also appeared on next day with new uid 300 (100,200,300) on 2016-11-02 At this point i want store cumulative count to be 3 not 5 as (user id 100 and 200 already appeared on past day ).
Input table:
date uid
2016-11-01 100
2016-11-01 200
2016-11-01 300
2016-11-01 400
2016-11-02 100
2016-11-02 200
2016-11-03 300
2016-11-03 400
2016-11-03 500
2016-11-03 600
2016-11-04 700
Expected query result:
date daily_cumulative_count
2016-11-01 4
2016-11-02 4
2016-11-03 6
2016-11-04 7
到目前为止,我每天都能获得累积的不同计数,但它也包括前一天的不同 uid.
Till now i am able to get cumulative distinct count per day but it includes previous distinct uids from previous day as well.
SELECT
date,
SUM(count) OVER (
ORDER BY date ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM (
SELECT
date,
COUNT(DISTINCT uid) AS count
FROM sample_table
GROUP by 1
)ORDER BY date DESC;
任何形式的帮助将不胜感激.
Any kind of help would be greatly appreciated.
推荐答案
最简单的方法:
SELECT *, count(*) over (order by fst_date ) cum_uids
FROM (
SELECT uid, min(date) fst_date FROM t GROUP BY uid
) t
或者类似的东西
这篇关于累积非重复计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!