使用Amazon Redshift/PostgreSQL进行同类群组分析 [英] Cohort analysis with Amazon Redshift / PostgreSQL

查看:102
本文介绍了使用Amazon Redshift/PostgreSQL进行同类群组分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用基于Redshift中存储的事件数据的同类群组分析来分析用户保留率.

I'm trying analyze user retention using a cohort analysis based on event data stored in Redshift.

例如,在Redshift中,我有:

For example, in Redshift I have:

timestamp          action        user id
---------          ------        -------
2015-05-05 12:00   homepage      1
2015-05-05 12:01   product page  1
2015-05-05 12:02   homepage      2
2015-05-05 12:03   checkout      1

我想提取每日保留队列.例如:

I would like to extract the daily retention cohort. For example:

signup_day  users_count d1  d2  d3  d4  d5  d6  d7 
----------  ----------- --  --  --  --  --  --  --  
2015-05-05  100         80  60  40  20  17  16  12
2015-05-06  150         120 90  60  30  22  18  15

其中signup_day代表我们记录用户操作的第一次日期,users_count是在signup_day上注册的用户总数,d1是执行任何操作a的用户数signup_day等之后的第二天...

Where signup_day represents the first date we have a record of a user action, users_count is the total amount of users who signed up on signup_day, d1 is the number of users who performed any action a day after signup_day etc...

是否有更好的方法来表示保留分析数据?

Is there a better way to represent the retention analysis data?

使用Amazon Redshift实现这一目标的最佳查询是什么?可以对单个查询进行处理吗?

What would be the best query to achieve that with Amazon Redshift? Is it possible to do with a single query?

推荐答案

最终,我在下面找到了满足我要求的查询.

Eventually I found the query below to satisfy my requirements.

WITH 

users AS (
  SELECT
    user_id,
    date_trunc('day', min(timestamp)) as activated_at
    from table
    group by 1
  )
,

events AS (
  SELECT user_id,
         action,
         timestamp AS occurred_at
    FROM table
)

SELECT DATE_TRUNC('day',u.activated_at) AS signup_date,


       TRUNC(EXTRACT('EPOCH' FROM e.occurred_at - u.activated_At)/(3600*24)) AS user_period,


       COUNT(DISTINCT e.user_id) AS retained_users
  FROM users u
  JOIN events e
    ON e.user_id = u.user_id
   AND e.occurred_at >= u.activated_at
 WHERE u.activated_at >= getdate() - INTERVAL '11 day'
 GROUP BY 1,2
 ORDER BY 1,2

它产生的表与我上面描述的略有不同(但更适合我的需要):

It produces a slightly different table than I described above (but is better for my needs):

signup_date  user_period  retained_users
-----------  -----------  --------------
2015-05-05   0            80
2015-05-05   1            60
2015-05-05   2            40
2015-05-05   3            20
2015-05-06   0            100
2015-05-06   1            80
2015-05-06   2            40
2015-05-06   3            20

这篇关于使用Amazon Redshift/PostgreSQL进行同类群组分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆