Redshift-计算每月活跃用户 [英] Redshift - Calculate monthly active users

查看:127
本文介绍了Redshift-计算每月活跃用户的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的表:

I have a table which looks like this:

Date       | User_ID
2017-1-1   |  1
2017-1-1   |  2
2017-1-1   |  4
2017-1-2   |  3
2017-1-2   |  2
...        |  ..
...        |  ..
...        |  ..
...        |  ..
2017-2-1   |  1
2017-2-2   |  2
...        |  ..
...        |  ..
...        |  ..

我想计算一个连续30天的每月活跃用户。我知道Redshift不执行COUNT(DISTINCT))窗口。我该怎么做才能得到以下输出?

I'd like to calculate the monthly active users over a rolling 30 day period. I know Redshift does not do COUNT(DISTINCT)) windowing. What can I do to get the following output?

Date      | MAU
2017-1-1  | 3
2017-1-2  | 4    <- We don't want to count user_id 2 twice.
...       | ..
...       | ..
...       | ..
2017-2-1  | ..
2017-2-2  | ..
...       | ..
...       | ..

我尝试这样做(显然失败了)。这是我的代码:

I attempted to do this (and clearly failed). Here's my code:

SELECT event_date
    ,sum(user_count) mau_count
    ,CASE
        WHEN event_date = date_trunc('week', event_date)
            THEN 1
        ELSE 0
        END week_starting FROM (
    SELECT event_date
        ,count(*) OVER (PARTITION BY event_date ORDER BY event_date ROWS BETWEEN 30 PRECEDING
                    AND CURRENT ROW
            ) AS user_count    <-- I know this is wrong. Just my attempt :)
    FROM (
        SELECT DISTINCT (user_id)
            ,event_date
        FROM event_table
        ) daily_distinct_users
    GROUP BY event_date
    ) cumulative_daily_distinct_users GROUP BY event_date;

请让我知道如何准确获得MAU计数。谢谢!

Please let me know how I can get the MAU count accurately. Thanks!

推荐答案

这似乎很有效(日志表是 dt userid ):

This one seems to work (column names in the log table are dt and userid):

SELECT
  end_date,
  -- The number of distinct users during the 30 days prior
  COUNT(DISTINCT userid) distinct_users
FROM log
JOIN
( -- A list of dates to appear in the output first column
  SELECT DISTINCT dt AS end_date
  FROM log
  WHERE dt BETWEEN date '2017-01-01' AND date '2017-01-31'
) ON dt BETWEEN end_date - interval '30 days' AND end_date
GROUP BY end_date
ORDER BY end_date

基本上,子选择将生成 end_dates 的列表,该列表显示为第一输出列。然后,它加入到在所选日期之前30天内出现的 userid 的不同数量。

Basically, the sub-select generates a list of end_dates that appear as the first output column. Then, it joins to the distinct number of userid that appear during the 30 days prior to the selected date.

这篇关于Redshift-计算每月活跃用户的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆