Redshift-计算每月活跃用户 [英] Redshift - Calculate monthly active users
问题描述
我有一个看起来像这样的表:
I have a table which looks like this:
Date | User_ID
2017-1-1 | 1
2017-1-1 | 2
2017-1-1 | 4
2017-1-2 | 3
2017-1-2 | 2
... | ..
... | ..
... | ..
... | ..
2017-2-1 | 1
2017-2-2 | 2
... | ..
... | ..
... | ..
我想计算一个连续30天的每月活跃用户。我知道Redshift不执行COUNT(DISTINCT))窗口。我该怎么做才能得到以下输出?
I'd like to calculate the monthly active users over a rolling 30 day period. I know Redshift does not do COUNT(DISTINCT)) windowing. What can I do to get the following output?
Date | MAU
2017-1-1 | 3
2017-1-2 | 4 <- We don't want to count user_id 2 twice.
... | ..
... | ..
... | ..
2017-2-1 | ..
2017-2-2 | ..
... | ..
... | ..
我尝试这样做(显然失败了)。这是我的代码:
I attempted to do this (and clearly failed). Here's my code:
SELECT event_date
,sum(user_count) mau_count
,CASE
WHEN event_date = date_trunc('week', event_date)
THEN 1
ELSE 0
END week_starting FROM (
SELECT event_date
,count(*) OVER (PARTITION BY event_date ORDER BY event_date ROWS BETWEEN 30 PRECEDING
AND CURRENT ROW
) AS user_count <-- I know this is wrong. Just my attempt :)
FROM (
SELECT DISTINCT (user_id)
,event_date
FROM event_table
) daily_distinct_users
GROUP BY event_date
) cumulative_daily_distinct_users GROUP BY event_date;
请让我知道如何准确获得MAU计数。谢谢!
Please let me know how I can get the MAU count accurately. Thanks!
推荐答案
这似乎很有效(日志
表是 dt
和 userid
):
This one seems to work (column names in the log
table are dt
and userid
):
SELECT
end_date,
-- The number of distinct users during the 30 days prior
COUNT(DISTINCT userid) distinct_users
FROM log
JOIN
( -- A list of dates to appear in the output first column
SELECT DISTINCT dt AS end_date
FROM log
WHERE dt BETWEEN date '2017-01-01' AND date '2017-01-31'
) ON dt BETWEEN end_date - interval '30 days' AND end_date
GROUP BY end_date
ORDER BY end_date
基本上,子选择将生成 end_dates
的列表,该列表显示为第一输出列。然后,它加入到在所选日期之前30天内出现的 userid
的不同数量。
Basically, the sub-select generates a list of end_dates
that appear as the first output column. Then, it joins to the distinct number of userid
that appear during the 30 days prior to the selected date.
这篇关于Redshift-计算每月活跃用户的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!