计算彼此相距10秒以内的行数 [英] Count number of rows that are not within 10 seconds of each other

查看:45
本文介绍了计算彼此相距10秒以内的行数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我跟踪网络访问者.我存储了IP地址以及访问的时间戳.

I track web visitors. I store the IP address as well as the timestamp of the visit.

ip_address    time_stamp
180.2.79.3  1301654105
180.2.79.3  1301654106
180.2.79.3  1301654354
180.2.79.3  1301654356
180.2.79.3  1301654358
180.2.79.3  1301654366
180.2.79.3  1301654368
180.2.79.3  1301654422

我有一个查询以获取总曲目:

I have a query to get total tracks:

SELECT COUNT(*) AS tracks FROM tracking

但是,我现在要忽略每次访问后10秒内多次访问过的用户的访问.由于我不认为这是另一次访问,因此它仍然是第一次访问的一部分.

However, I now want to disregard visits from users that have visited multiple times within 10 seconds of each visit. Since I don't consider this another visit, its still part of the first visit.

当ip_address相同时,检查 时间戳,只计算那些行 每个相距10秒 其他.

When the ip_address is the same, check timestamp and only count those rows that are 10 seconds away from each other.

我很难将其放入SQL查询表单中,对此我将不胜感激!

I am having difficulty in putting this into a SQL query form, I would appreciate any help on this!

推荐答案

让我从此表开始.我将使用普通时间戳记,以便我们可以轻松了解发生了什么.

Let me start with this table. I'll use ordinary timestamps so we can easily see what's going on.

180.2.79.3   2011-01-01 08:00:00
180.2.79.3   2011-01-01 08:00:09
180.2.79.3   2011-01-01 08:00:20
180.2.79.3   2011-01-01 08:00:23
180.2.79.3   2011-01-01 08:00:25
180.2.79.3   2011-01-01 08:00:40
180.2.79.4   2011-01-01 08:00:00
180.2.79.4   2011-01-01 08:00:13
180.2.79.4   2011-01-01 08:00:23
180.2.79.4   2011-01-01 08:00:25
180.2.79.4   2011-01-01 08:00:27
180.2.79.4   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:50

如果我对您的理解正确,那么您想这样算.

If I understand you correctly, you want to count these like this.

180.2.79.3   3
180.2.79.4   3

您可以通过选择两个的最大时间戳来为每个ip_address做到这一点

You can do that for each ip_address by selecting the maximum timestamp that is both

  • 大于当前行的 时间戳记和
  • 比当前行的时间戳大不超过10秒.
  • greater than the current row's timestamp, and
  • less than or equal to 10 seconds greater than the current row's timestamp.

将这两个条件结合起来会引入一些空值,这些空值确实非常有用.

Taking these two criteria together will introduce some nulls, which turn out to be really useful.

select ip_address, 
       t_s.time_stamp, 
       (select max(t.time_stamp) 
        from t_s t 
        where t.ip_address = t_s.ip_address 
          and t.time_stamp > t_s.time_stamp
          and t.time_stamp - t_s.time_stamp <= interval '10' second) next_page
from t_s 
group by ip_address, t_s.time_stamp
order by ip_address, t_s.time_stamp;

ip_address   time_stamp            next_page
180.2.79.3   2011-01-01 08:00:00   2011-01-01 08:00:09
180.2.79.3   2011-01-01 08:00:09   <null>
180.2.79.3   2011-01-01 08:00:20   2011-01-01 08:00:25
180.2.79.3   2011-01-01 08:00:23   2011-01-01 08:00:25
180.2.79.3   2011-01-01 08:00:25   <null>
180.2.79.3   2011-01-01 08:00:40   <null>
180.2.79.4   2011-01-01 08:00:00   <null>
180.2.79.4   2011-01-01 08:00:13   2011-01-01 08:00:23
180.2.79.4   2011-01-01 08:00:23   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:25   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:27   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:29   <null>
180.2.79.4   2011-01-01 08:00:50   <null>

标记访问结束的时间戳记其自己的next_page为空.这是因为该行的时间戳不小于或等于time_stamp + 10秒.

The timestamp that marks the end of a visit has a null for its own next_page. That's because no timestamp is less than or equal to time_stamp + 10 seconds for that row.

要获得计数,我可能会创建一个视图并计算空值.

To get a count, I'd probably create a view and count the nulls.

select ip_address, count(*)
from t_s_visits 
where next_page is null
group by ip_address

180.2.79.3   3
180.2.79.4   3

这篇关于计算彼此相距10秒以内的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆