日志中每天的每周活动用户 [英] Weekly Active Users for each day from log

查看:90
本文介绍了日志中每天的每周活动用户的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有人可以通过一些SQL帮助我,以返回两天或更长时间(让我们以7天为参考)期间登录到数据库表中的唯一身份用户的数量.

我的日志表每行包含一个时间戳(ts)和user_id,代表该用户当时的活动.

以下查询从该日志返回每日活动用户或DAU:

SELECT FLOOR(ts / 86400) AS day, COUNT(DISTINCT user_id) AS dau
FROM log
GROUP BY day ORDER BY day ASC

现在让我们说我想在此单个查询中添加(或至少以最有效的方式进行检索)每周活跃用户或7天期间记录的唯一身份用户总数.但是,我不想将时间分散在不重叠的几周中.我需要每天计算在那一天和前6天中看到的不同的user_id.

例如:

day users wau
1   1,2   2
4   1,3   3
7   3,4,5 5
8   5     4    (user_id 2 lost from count)
15  2     2    (user_ids 1,3,4 lost from count)

感谢您提供的任何帮助,如果需要进一步说明,请随时通过评论询问.

解决方案

要获得每周平均用户"计数(据我对您的规范的了解...",每天,该日中看到的不同user_id的计数以及前六天"),则可以使用以下内容进行查询. (查询还会返回每日平均用户"计数.

SELECT d.day
     , COUNT(DISTINCT u.user_id) AS wau
     , COUNT(DISTINCT IF(u.day=d.day,u.user_id,NULL)) AS dau
  FROM ( SELECT FLOOR(k.ts/86400) AS `day`
           FROM `log` k
          GROUP BY `day`
       ) d
  JOIN ( SELECT FLOOR(l.ts/86400) AS `day`
              , l.user_id
           FROM `log` l
          GROUP BY `day`, l.user_id
       ) u
    ON u.day <= d.day
   AND u.day > d.day - 7
 GROUP BY d.day
 ORDER BY d.day

(我尚未对此进行测试;但是稍后,如果需要更正,我将更新此语句.)

此查询将给定日期(来自u行源)的用户列表与日志表(d行源)中的一组日期连接起来.注意连接谓词(ON子句)中出现的文字"7",这就是使用户列表与前6天匹配"的原因.

请注意,例如,可以通过在SELECT列表中添加另一个表达式,将其扩展以获取过去3天的唯一用户数.

     , COUNT(DISTINCT IF(u.day<=d.day AND u.day>d.day-3,u.user_id,NULL)) AS 3day

可以将文字"7"增加到更大的范围.上面表达式中的文字3可以更改为任意天数...我们只需要确保已经有足够的前一天行(d中的行)连接到u中的每一行./p>

性能注:由于有内联视图(或派生表,如MySQL所称的),因此此查询可能不会很快,因为必须将这些内联视图的结果集具体化为中间MyISAM表.

别名为u的嵌入式视图可能不是最佳选择;直接连接到日志表可能更快.我在考虑在给定的一天中获得唯一的用户列表,这就是内联视图中的查询使我得到的东西.对我来说,将发生的事情概念化只是一件容易的事.我当时在想,如果您当天有数百个相同的用户输入,那么在我们加入其他日期之前,内联视图将淘汰掉一大堆重复项. 最好在ud内联视图中添加WHERE子句以限制返回的天数. (d内联视图将需要在更早的6天内包括在内.)


另一方面,如果ts列是TIMESTAMP数据类型,我将更倾向于使用DATE(ts)表达式来提取日期部分.但这会在结果集中返回DATE数据类型,而不是整数,该数据类型与您指定的结果集不同.)

SELECT d.day
     , COUNT(DISTINCT u.user_id) AS wau
     , COUNT(DISTINCT IF(u.day=d.day,u.user_id,NULL)) AS dau
  FROM ( SELECT DATE(k.ts) AS `day`
           FROM `log` k
          GROUP BY `day`
       ) d
  JOIN ( SELECT DATE(l.ts) AS `day`
              , l.user_id
           FROM `log` l
          GROUP BY `day`, l.user_id
       ) u
    ON u.day <= d.day
   AND u.day > DATE_ADD(d.day, INTERVAL -7 DAY)
 GROUP BY d.day
 ORDER BY d.day


I was wondering if someone could help me with some SQL for returning the amount of unique users logged into a database table during a period of two or more days (let's use 7 days as a reference).

My log table contains a timestamp (ts) and user_id in each row, representing activity from that user at that time.

The following query returns the Daily Active Users or DAU from this log:

SELECT FLOOR(ts / 86400) AS day, COUNT(DISTINCT user_id) AS dau
FROM log
GROUP BY day ORDER BY day ASC

Now let's say I would like to add to this single query (or at least retrieve in the most efficient possible fashion) the Weekly Active Users, or total unique users logged for a period of 7 days. However, I don't want to divide my time in non-overlapping weeks. What I need is to count, for each day, the distinct user_ids seen during that day and the 6 previous days.

For example:

day users wau
1   1,2   2
4   1,3   3
7   3,4,5 5
8   5     4    (user_id 2 lost from count)
15  2     2    (user_ids 1,3,4 lost from count)

Thank you for any help you can provide and feel free to ask via comment if you need further clarification.

解决方案

To get a "Weekly Average User" count (per my understanding of your specification... "for each day, the count of distinct user_ids seen during that day and the previous six days"), a query along the lines of the one below could be used. (The query also returns the "Daily Average User" count.

SELECT d.day
     , COUNT(DISTINCT u.user_id) AS wau
     , COUNT(DISTINCT IF(u.day=d.day,u.user_id,NULL)) AS dau
  FROM ( SELECT FLOOR(k.ts/86400) AS `day`
           FROM `log` k
          GROUP BY `day`
       ) d
  JOIN ( SELECT FLOOR(l.ts/86400) AS `day`
              , l.user_id
           FROM `log` l
          GROUP BY `day`, l.user_id
       ) u
    ON u.day <= d.day
   AND u.day > d.day - 7
 GROUP BY d.day
 ORDER BY d.day

(I have not yet run a test of this; but I will later, and I will update this statement if any corrections are needed.)

This query is joining the list of users for a given day (from the u rowsource), to a set of days from the log table (the d rowsource). Note the literal "7" that appears in the join predicate (the ON clause), that's what's getting the user list "matched" to the previous 6 days.

Note that this could also be extended to get the distinct user count over the past 3 days, for example, by adding another expression in the SELECT list.

     , COUNT(DISTINCT IF(u.day<=d.day AND u.day>d.day-3,u.user_id,NULL)) AS 3day

That literal "7" could be increased to get a larger range. And that literal 3 in the expression above could be changed to get any number of days... we just need to be sure we've got enough previous day rows (from d) joined to each row from u.

PERFORMANCE NOTE: Due to the inline views (or derived tables, as MySQL calls them), this query may not be very fast, since the resultsets for those inline views has to be materialized into intermediate MyISAM tables.

The inline view aliased as u may not be optimal; it might be faster to join directly to the log table. I was thinking in terms of getting a unique list of users for a given day, which is what that query in the inline view got me. It was just easier for me to conceptualize what was going on. And I was thinking that if you had hundreds of the same user entered for day, the inline view would weed out a whole bunch of the duplicates, before we did the join to the other days. A WHERE clause to limit the number of days we are returning would be best added inside the u and d inline views. (The d inline view would need to include an extra earlier 6 days.)


On another note, if ts column is TIMESTAMP datatype, I would be more inclined to use a DATE(ts) expression to extract the date portion. But that would return a DATE datatype in the resultset, rather than an integer, which would be different from the resultset you specified.)

SELECT d.day
     , COUNT(DISTINCT u.user_id) AS wau
     , COUNT(DISTINCT IF(u.day=d.day,u.user_id,NULL)) AS dau
  FROM ( SELECT DATE(k.ts) AS `day`
           FROM `log` k
          GROUP BY `day`
       ) d
  JOIN ( SELECT DATE(l.ts) AS `day`
              , l.user_id
           FROM `log` l
          GROUP BY `day`, l.user_id
       ) u
    ON u.day <= d.day
   AND u.day > DATE_ADD(d.day, INTERVAL -7 DAY)
 GROUP BY d.day
 ORDER BY d.day


这篇关于日志中每天的每周活动用户的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆