累积非重复计数 [英] Cumulative distinct count

查看:44
本文介绍了累积非重复计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查询以获取每天累积的不同 uid 计数.

I am working on query to get cumulative distinct count of uids on daily basis.

示例:假设有 2 个 uid (100,200) 出现在日期 2016-11-01 并且它们也在第二天出现在 2016-11-02 的新 uid 300 (100,200,300)在这一点上,我希望存储累积计数为 3,而不是 5,因为(用户 ID 100 和 200 已在过去一天出现).

Example : Say there are 2 uids (100,200) appeared on date 2016-11-01 and they also appeared on next day with new uid 300 (100,200,300) on 2016-11-02 At this point i want store cumulative count to be 3 not 5 as (user id 100 and 200 already appeared on past day ).

Input table:

    date            uid         
2016-11-01          100
2016-11-01          200
2016-11-01          300
2016-11-01          400         
2016-11-02          100
2016-11-02          200                 
2016-11-03          300
2016-11-03          400
2016-11-03          500
2016-11-03          600
2016-11-04          700

Expected query result:

date            daily_cumulative_count
2016-11-01              4   
2016-11-02              4
2016-11-03              6
2016-11-04              7

到目前为止,我每天都能获得累积的不同计数,但它也包括前一天的不同 uid.

Till now i am able to get cumulative distinct count per day but it includes previous distinct uids from previous day as well.

SELECT 
  date, 
  SUM(count) OVER (
    ORDER BY date ASC 
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  )
FROM (
  SELECT 
    date, 
    COUNT(DISTINCT uid) AS count
  FROM sample_table
  GROUP by 1
)ORDER BY date DESC;

任何形式的帮助将不胜感激.

Any kind of help would be greatly appreciated.

推荐答案

最简单的方法:

SELECT *, count(*) over (order by fst_date ) cum_uids
  FROM (
SELECT uid, min(date) fst_date FROM t GROUP BY uid
 ) t

或者类似的东西

这篇关于累积非重复计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆