使用Amazon Redshift/PostgreSQL进行同类群组分析 [英] Cohort analysis with Amazon Redshift / PostgreSQL
问题描述
我正在尝试使用基于Redshift中存储的事件数据的同类群组分析来分析用户保留率.
I'm trying analyze user retention using a cohort analysis based on event data stored in Redshift.
例如,在Redshift中,我有:
For example, in Redshift I have:
timestamp action user id
--------- ------ -------
2015-05-05 12:00 homepage 1
2015-05-05 12:01 product page 1
2015-05-05 12:02 homepage 2
2015-05-05 12:03 checkout 1
我想提取每日保留队列.例如:
I would like to extract the daily retention cohort. For example:
signup_day users_count d1 d2 d3 d4 d5 d6 d7
---------- ----------- -- -- -- -- -- -- --
2015-05-05 100 80 60 40 20 17 16 12
2015-05-06 150 120 90 60 30 22 18 15
其中signup_day
代表我们记录用户操作的第一次日期,users_count
是在signup_day
上注册的用户总数,d1
是执行任何操作a的用户数signup_day
等之后的第二天...
Where signup_day
represents the first date we have a record of a user action, users_count
is the total amount of users who signed up on signup_day
, d1
is the number of users who performed any action a day after signup_day
etc...
是否有更好的方法来表示保留分析数据?
Is there a better way to represent the retention analysis data?
使用Amazon Redshift实现这一目标的最佳查询是什么?可以对单个查询进行处理吗?
What would be the best query to achieve that with Amazon Redshift? Is it possible to do with a single query?
推荐答案
最终,我在下面找到了满足我要求的查询.
Eventually I found the query below to satisfy my requirements.
WITH
users AS (
SELECT
user_id,
date_trunc('day', min(timestamp)) as activated_at
from table
group by 1
)
,
events AS (
SELECT user_id,
action,
timestamp AS occurred_at
FROM table
)
SELECT DATE_TRUNC('day',u.activated_at) AS signup_date,
TRUNC(EXTRACT('EPOCH' FROM e.occurred_at - u.activated_At)/(3600*24)) AS user_period,
COUNT(DISTINCT e.user_id) AS retained_users
FROM users u
JOIN events e
ON e.user_id = u.user_id
AND e.occurred_at >= u.activated_at
WHERE u.activated_at >= getdate() - INTERVAL '11 day'
GROUP BY 1,2
ORDER BY 1,2
它产生的表与我上面描述的略有不同(但更适合我的需要):
It produces a slightly different table than I described above (but is better for my needs):
signup_date user_period retained_users
----------- ----------- --------------
2015-05-05 0 80
2015-05-05 1 60
2015-05-05 2 40
2015-05-05 3 20
2015-05-06 0 100
2015-05-06 1 80
2015-05-06 2 40
2015-05-06 3 20
这篇关于使用Amazon Redshift/PostgreSQL进行同类群组分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!