如何使用mysql创建每周队列分析表? [英] How can I create a weekly cohort analysis table using mysql?

查看:95
本文介绍了如何使用mysql创建每周队列分析表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个用户表,该表至少具有该用户注册的日期和一个ID.

Let's say you have a user table that has at least the date the user signed up and an id.

现在,假设您有一个单独的表,该表可跟踪可能在用户生命周期中任何时候发生的付款之类的操作. (就像在应用内购买一样.)在该表中,我们跟踪userId,付款日期和付款ID.

Now let's say you have a separate table that tracks an action like a payment that can happen at any point in the user's lifetime. (Say like an in-app purchase.) In that table we track the userId, payment date, and an id for the payment.

因此,我们需要执行以下操作来设置架构:

So we have something that looks like this to get our schema set up:

CREATE TABLE users (
  UserId INT,
  AddedDate DATETIME
);

CREATE TABLE payments (
  PaymentId INT,
  UserId INT,
  PaymentDate Datetime
);

现在,您需要一个显示每周同类群组的表格.看起来像这样的表:

Now you want a table that shows weekly cohorts. A table that looks something like this:

Week       size w1  w2  w3  w4  w5  w6  w7
2017-08-28  1   0   0   0   1   0   0   0
2017-09-04  3   1   0   2   0   1   1   2
2017-09-11  2   0   0   1   0   0   0   1
2017-09-18  6   3   1   4   3   1   1   2
2017-09-25  2   1   1   1   0   1   2   0
2017-10-02  7   5   2   3   4   3   1   0
2017-10-09  7   4   5   1   2   5   0   0
2017-10-16  2   1   2   1   1   0   0   0
2017-10-23  7   5   4   4   3   0   0   0
2017-10-30  8   8   7   0   0   0   0   0
2017-11-06  5   5   2   0   0   0   0   0

因此,第一列包含星期,第二列包含该周签约的人数.假设我们查看2017年9月18日这一周.那个星期有6个人签约. w1栏下的3表示在签约的那一周,这6个人中有3个人进行了购买. w2下的1表示签约的第二周,这6个人中有1个人进行了购买,依此类推.

So the first column has the week, the second has number of people that signed up that week. Say we look at week 2017-09-18. 6 people signed up that week. The 3 under the w1 column means that 3 people out of that 6 made a purchase the week they signed up. The 1 under w2 means 1 person out of that 6 made a purchase the second week they were signed up, and so on.

我将使用哪种查询来获得一个看起来像这样的表?

What query would I use to get a table that looks like that?

推荐答案

此查询是根据我在此处编写的查询进行修改的:

This query is modified from the one I wrote here: Cohort analysis in SQL

这是最后一个查询:

SELECT
  STR_TO_DATE(CONCAT(tb.cohort, ' Monday'), '%X-%V %W') as date,
  size,
  w1,
  w2,
  w3,
  w4,
  w5,
  w6,
  w7
FROM (
  SELECT u.cohort, 
    IFNULL(SUM(s.Offset = 0), 0) w1,
    IFNULL(SUM(s.Offset = 1), 0) w2,
    IFNULL(SUM(s.Offset = 2), 0) w3,
    IFNULL(SUM(s.Offset = 3), 0) w4,
    IFNULL(SUM(s.Offset = 4), 0) w5,
    IFNULL(SUM(s.Offset = 5), 0) w6,
    IFNULL(SUM(s.Offset = 6), 0) w7
  FROM (
   SELECT
      UserId,
      DATE_FORMAT(AddedDate, "%Y-%u") AS cohort
    FROM users
  ) as u
  LEFT JOIN (
      SELECT DISTINCT
      payments.UserId,
      FLOOR(DATEDIFF(payments.PaymentDate, users.AddedDate)/7) AS Offset
      FROM payments
      LEFT JOIN users ON (users.UserId = payments.UserId)
  ) as s ON s.UserId = u.UserId
  GROUP BY u.cohort
) as tb
LEFT JOIN (
  SELECT DATE_FORMAT(AddedDate, "%Y-%u") dt, COUNT(*) size FROM users GROUP BY dt
) size ON tb.cohort = size.dt

因此,这是我们的核心工作,因为我们每周进行一次队列研究,所以我们会抓住用户及其注册的日期,并按年-周的数字格式化日期.

So the core of this is we grab the users and the date they signed up and format the date by year-week number, since we are doing a weekly cohort.

SELECT
  UserId,
  DATE_FORMAT(AddedDate, "%Y-%u") AS cohort
FROM users

由于我们要按同类群组分组,因此必须将其放在查询的FROM部分的子查询中.

Since we want to group by the cohort we have to put this in a subquery in the FROM part of the query.

然后我们要在用户上加入付款信息.

Then we want join the payment information on the users.

SELECT DISTINCT
  payments.UserId,
  FLOOR(DATEDIFF(payments.PaymentDate, users.AddedDate)/7) AS Offset
  FROM payments
  LEFT JOIN users ON (users.UserId = payments.UserId)

这将根据他们成为用户的周数来获得每位用户唯一的每周付款事件.我们使用与众不同的方式是因为,如果一个用户在一周内进行了2次购物,我们就不希望将其计为两个用户.

This will get unique weekly payment events per user by the numbers of weeks they have been a user. We use distinct because if a user made 2 purchase in one week, we don't want to count that as two users.

我们不仅仅使用付款表,因为某些用户可能注册并且没有付款.因此,我们从用户表中选择并加入付款表.

We don't just use the payments table, because some users may sign up and not have payments. So we select from the users table and join on the payments table.

然后按周分组-u.cohort.然后,您可以汇总周数,以了解签约后几周有多少人付款.

You then group by the week - u.cohort. Then you aggregate on the week numbers to find out how many people made payments the weeks after they signed up.

我使用的mysql版本将sql_mode设置为only_full_group_by.因此,为了获得同类群组的大小,我将查询的大部分内容放在子查询中,这样我就可以加入用户以获取同类群组的大小.

The version of mysql I used had sql_mode set to only_full_group_by. So to get the cohort size I put the bulk of the query in subquery so I could join on the users to get the size of the cohort.

更多注意事项:

按周过滤很简单. tb.cohort>开始日期,而tb.cohort<结束日期,其中开始和结束日期的格式为%Y-%u".为了提高查询效率,您可能还希望过滤掉不在日期范围内的付款事件,以便不加入不需要的数据.

Filter by weeks is simple. tb.cohort > start date and tb.cohort < end date where start and end date are formatted with "%Y-%u". To make the query more efficient you'll probably want to filter out payment events that don't fall within the date range as well so you're not joining on data you don't need.

您可能要考虑使用日历表来涵盖一周内没有用户注册的情况.

You may want to consider using a calender table to cover cases where there are no user sign ups during the week.

在这里一切正常的小提琴: http://sqlfiddle.com/#!9/172dbe/1

Here's a fiddle with everything working: http://sqlfiddle.com/#!9/172dbe/1

这篇关于如何使用mysql创建每周队列分析表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆