查询以查找所有时间戳,间隔超过一定间隔 [英] Query to find all timestamps more than a certain interval apart

查看:127
本文介绍了查询以查找所有时间戳,间隔超过一定间隔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用postgres对用户活动进行一些分析。我有一张表格,其中列出了每个用户提出的所有请求(网页浏览量)以及请求的时间戳,我正在尝试查找每个用户的不同会话数。为简单起见,我考虑将每组请求与其他请求相距一个小时或更长时间,作为一个单独的会话。数据看起来像这样:

I'm using postgres to run some analytics on user activity. I have a table of all requests(pageviews) made by every user and the timestamp of the request, and I'm trying to find the number of distinct sessions for every user. For the sake of simplicity, I'm considering every set of requests an hour or more apart from others as a distinct session. The data looks something like this:

id|          request_time|         user_id
1    2014-01-12 08:57:16.725533    1233
2    2014-01-12 08:57:20.944193    1234
3    2014-01-12 09:15:59.713456    1233
4    2014-01-12 10:58:59.713456    1234

如何编写查询以获得每个用户的会话数?

How can I write a query to get the number of sessions per user?

推荐答案

在间隔大于等于1小时后开始新的会话:

To start a new session after every gap >= 1 hour:

SELECT user_id, count(*) AS distinct_sessions
FROM (
   SELECT user_id
        ,(lag(request_time, 1, '-infinity') OVER (PARTITION BY user_id
                                                  ORDER BY request_time)
           <= request_time - '1h'::interval) AS step -- start new session
   FROM   tbl
   ) sub
WHERE  step
GROUP  BY user_id
ORDER  BY user_id;

假设 request_time NOT NULL


  • 在子查询 sub ,检查是否有新的会话开始的每一行。使用 lag()的第三个参数提供默认的 -infinity ,该值比任何时间戳都要小,因此始终在第一行中开始新会话。

  • In subquery sub, check for every row if a new session begins. Using the third parameter of lag() to provide the default -infinity, which is lower than any timestamp and therefore always starts a new session for the first row.

在外部查询中,计算新会话启动了多少次。消除 step = FALSE 并按每个用户计数。

In the outer query count how many times new sessions started. Eliminate step = FALSE and count per user.

如果您真的想计算至少一个请求发生的小时数(我不认为您这样做,但是另一个答案会假设如此),您将:

If you really wanted to count hours where at least one request happened (I don't think you do, but another answer assumes as much), you would:

SELECT user_id
     , count(DISTINCT date_trunc('hour', request_time)) AS hours_with_req
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

这篇关于查询以查找所有时间戳,间隔超过一定间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆