计算连续访问次数 [英] Count number of consecutive visits

查看:89
本文介绍了计算连续访问次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每次登录的用户访问网站时,其数据都会放入包含userId和日期(每位用户每天一行或零行)的表中:

Every time a logged in user visits the website their data is put into a table containing the userId and date (either one or zero row per user per day):

   444631 2011-11-07
   444631 2011-11-06
   444631 2011-11-05
   444631 2011-11-04
   444631 2011-11-02
   444631 2011-11-01

当我从主用户表中提取用户数据时,我需要准备好访问连续访问的次数.对于该用户来说,应该是4.

I need to have ready access to the number of consecutive visits when I pull the user data from the main user table.. In the case for this user, it would be 4.

当前,我正在通过主用户表中的非规范化consecutivevisits计数器执行此操作,但是由于未知原因,有时会重置该计数器.我想尝试一种仅使用上表中数据的方法.

Currently I'm doing this through a denormalized consecutivevisits counter in the main user table, however for unknown reasons it sometimes resets.. I want to try an approach that uses exclusively the data in the table above.

获得该数字(上面的示例中为4)的最佳SQL查询是什么?有用户访问了数百次,我们每天有数百万的注册用户和点击数.

What's the best SQL query to get that number (4 in the example above)? There are users who have hundreds of visits, we have millions of registered users and hits per day.

编辑:根据下面的评论,我正在发布当前用于执行此操作的代码;但是,它有一个问题,有时它会无缘无故地重置,并且还会在周末为所有人 重置它,这很可能是由于DST更改所致.

EDIT: As per the comments below I'm posting the code I currently use to do this; it however has the problem that it sometimes resets for no reason and it also reset it for everyone during the weekend, most likely because of the DST change.

// Called every page load for logged in users
public static function OnVisit($user)
{
    $lastVisit = $user->GetLastVisit(); /* Timestamp; db server is on the same timezone as www server */
    if(!$lastVisit)
        $delta = 2;
    else
    {
        $today = date('Y/m/d');

        if(date('Y/m/d', $lastVisit) == $today)
            $delta = 0;
        else if(date('Y/m/d', $lastVisit + (24 * 60 * 60)) == $today)
            $delta = 1;
        else
            $delta = 2;
    }

    if(!$delta)
        return;

    $visits = $user->GetConsecutiveVisits();
    $userId = $user->GetId();

            /* NOTE: t_dailyvisit is the table I pasted above. The table is unused;
             * I added it only to ensure that the counter sometimes really resets
             * even if the user visits the website, and I could confirm that. */
    q_Query("INSERT IGNORE INTO `t_dailyvisit` (`user`, `date`) VALUES ($userId, CURDATE())", DB_DATABASE_COMMON);

    /* User skipped 1 or more days.. */
    if($delta > 1)
        $visits = 1;
    else if($delta == 1)
        $visits += 1;

    q_Query("UPDATE `t_user` SET `consecutivevisits` = $visits, `lastvisit` = CURDATE(), `nvotesday` = 0 WHERE `id` = $userId", DB_DATABASE_COMMON);
    $user->ForceCacheExpire();
}

推荐答案

我错过了mysql标签,并写下了这个解决方案.遗憾的是,这在MySQL中不起作用,因为它不支持窗口功能.

I missed the mysql tag and wrote up this solution. Sadly, this does not work in MySQL as it does not support window functions.

无论如何,我都会尽力而为.经过PostgreSQL测试.将与Oracle或SQL Server(或任何其他支持窗口功能的体面的RDBMS)类似地工作.

I post it anyway, as I put some effort into it. Tested with PostgreSQL. Would work similarly with Oracle or SQL Server (or any other decent RDBMS that supports window functions).

CREATE TEMP TABLE v(id int, visit date);
INSERT INTO v VALUES
 (444631, '2011-11-07')
,(444631, '2011-11-06')
,(444631, '2011-11-05')
,(444631, '2011-11-04')
,(444631, '2011-11-02')
,(444631, '2011-11-01')
,(444632, '2011-12-02')
,(444632, '2011-12-03')
,(444632, '2011-12-05');

简单版本

-- add 1 to "difference" to get number of days of the longest period
SELECT id, max(dur) + 1 as max_consecutive_days
FROM (

   -- calculate date difference of min and max in the group
   SELECT id, grp, max(visit) - min(visit) as dur
   FROM (

      -- consecutive days end up in a group
      SELECT *, sum(step) OVER (ORDER BY id, rn) AS grp
      FROM   (

         -- step up at the start of a new group of days
         SELECT id
               ,row_number() OVER w AS rn
               ,visit
               ,CASE WHEN COALESCE(visit - lag(visit) OVER w, 1) = 1
                THEN 0 ELSE 1 END AS step
         FROM   v
         WINDOW w AS (PARTITION BY id ORDER BY visit)
         ORDER  BY 1,2
         ) x
      ) y
      GROUP BY 1,2
   ) z
GROUP  BY 1
ORDER  BY 1
LIMIT  1;

输出:

   id   | max_consecutive_days
--------+----------------------
 444631 |                    4

更快/更短

我后来发现了一个更好的方法. grp数字不是连续的(而是连续上升的).没关系,因为这些只是达到目的的手段:

Faster / Shorter

I later found an even better way. grp numbers are not continuous (but continuously rising). Doesn't matter, since those are just a mean to an end:

SELECT id, max(dur) + 1 AS max_consecutive_days
FROM (
    SELECT id, grp, max(visit) - min(visit) AS dur
    FROM (
      -- subtract an integer representing the number of day from the row_number()
      -- creates a "group number" (grp) for consecutive days
      SELECT id
            ,EXTRACT(epoch from visit)::int / 86400
           - row_number() OVER (PARTITION BY id ORDER BY visit) AS grp
            ,visit
      FROM   v
      ORDER  BY 1,2
      ) x
    GROUP BY 1,2
    ) y
GROUP  BY 1
ORDER  BY 1
LIMIT  1;

SQL提琴.

  • A procedural solution for a similar problem.
    You might be able to implement something similar in MySQL.
  • Closely related answers on dba.SE with extensive explanation here and here.
  • And on SO:
    GROUP BY and aggregate sequential numeric values

这篇关于计算连续访问次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆