mysql选择时间跨度之间的行数 [英] mysql select number of rows between time span

查看:83
本文介绍了mysql选择时间跨度之间的行数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取特定时间范围或时间跨度之间的总行数. 基本上,让我们说一下下表:

I'm trying to get the total number of rows between a specific amout of time or time span. Basically, let's say the following table:

CREATE TABLE IF NOT EXISTS `downloads` (
`id` int(7) NOT NULL AUTO_INCREMENT,
`stuff_id` int(7) NOT NULL,
`user_id` int(7) NOT NULL,
`dl_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1;

每次有人下载内容时,都会填充此表.

And this table is populated each time someone downloads something.

因此,我真正需要的是获取用户列表(user_id),这些用户列表在例如24小时的时间内进行了超过100次下载.不是在过去的24小时内,而是在那个确切的时间段内,即使去年圣诞节期间也是如此=)

So what I really need is to get a list of users (user_id) that have made more than for example 100 downloads in a periods of time of, for example 24 hours. Not in the last 24 hours, but IN that exact period of time even if it has been during christmas last year =)

有什么想法吗?!

推荐答案

好,我知道我来晚了,但是我还是想发表我的回答:-)

OK, I realise I'm a bit late, but I wanted to post my answer anyway :-)

您可以使用子查询来完成所需的操作,但这可能需要很长时间才能在大型表上完成...

What you require can be done using a subquery, but this might take ages to complete on a large table...

关于这个问题,我想到了两种不同的方法.

Thinking about the question I came to two different approaches.

其中一个已经在其他答案中得到了解决,它的工作方式是从特定的时间点开始,查看此时开始的时间间隔,然后查看紧随其后的等时间隔.这样可以带来清晰,可理解的结果,并且可能是所需的结果(例如,每个日历日用户不得超过100次下载).但是,这将完全错过用户在午夜前一小时进行99次下载,而在新一天的第一小时又进行99次下载的情况.

One of them has already been dealt with in the other answers, it works by starting at a specific point in time, looking at the interval that begins at this time and then looking at the interval of equal duration that immediately follows. This leads to clear, understandable results and is probably what would be required (e.g. user must not exceed 100 downloads per calender day). This however would completely miss situations in which a user does 99 downloads during the hour before midnight and another 99 in the first hour of the new day.

因此,如果所需结果是前十名下载者列表"中的更多内容,则这是另一种方法.乍看之下,这里的结果可能不太容易理解,因为一次下载可以计入多个时间间隔.这是因为间隔将(并且需要)重叠.

So if the required result is more of a "top ten downloaders list", then this is the other approach. The results here may not be as understandable at first glance, because one single download can count towards multiple intervals. This is because the intervals will (and need to) overlap.

这是我的设置.我已经根据您的声明创建了表格,并添加了两个索引:

Here's my setup. I've created the table from your statement and added two indexes:

CREATE INDEX downloads_timestamp on downloads (dl_date);
CREATE INDEX downloads_user_id on downloads (user_id);

我插入表中的数据:

SELECT * FROM downloads;
+----+----------+---------+---------------------+
| id | stuff_id | user_id | dl_date             |
+----+----------+---------+---------------------+
|  1 |        1 |       1 | 2011-01-24 09:00:00 |
|  2 |        1 |       1 | 2011-01-24 09:30:00 |
|  3 |        1 |       1 | 2011-01-24 09:35:00 |
|  4 |        1 |       1 | 2011-01-24 10:00:00 |
|  5 |        1 |       1 | 2011-01-24 11:00:00 |
|  6 |        1 |       1 | 2011-01-24 11:15:00 |
|  7 |        1 |       1 | 2011-01-25 09:15:00 |
|  8 |        1 |       1 | 2011-01-25 09:30:00 |
|  9 |        1 |       1 | 2011-01-25 09:45:00 |
| 10 |        1 |       2 | 2011-01-24 08:00:00 |
| 11 |        1 |       2 | 2011-01-24 12:00:00 |
| 12 |        1 |       2 | 2011-01-24 12:01:00 |
| 13 |        1 |       2 | 2011-01-24 12:02:00 |
| 14 |        1 |       2 | 2011-01-24 12:03:00 |
| 15 |        1 |       2 | 2011-01-24 12:00:00 |
| 16 |        1 |       2 | 2011-01-24 12:04:00 |
| 17 |        1 |       2 | 2011-01-24 12:05:00 |
| 18 |        1 |       2 | 2011-01-24 12:06:00 |
| 19 |        1 |       2 | 2011-01-24 12:07:00 |
| 20 |        1 |       2 | 2011-01-24 12:08:00 |
| 21 |        1 |       2 | 2011-01-24 12:09:00 |
| 22 |        1 |       2 | 2011-01-24 12:10:00 |
| 23 |        1 |       2 | 2011-01-25 14:00:00 |
| 24 |        1 |       2 | 2011-01-25 14:12:00 |
| 25 |        1 |       2 | 2011-01-25 14:25:00 |
+----+----------+---------+---------------------+
25 rows in set (0.00 sec)

如您所见,所有下载都是在昨天或今天进行的,并且是由两个不同的用户执行的.

As you can see, all downloads occured either yesterday or today and were executed by two different users.

现在,我们要注意的是:在数学上"为"2011-01-24 0:00"到"2011-01"之间的24小时间隔(或任何其他持续时间的间隔)是无限的-25 23:59:59'.但是,由于服务器的精度为一秒,因此可以归结为86,400个间隔:

Now, what we have to mind is the following: There is (mathematically) an infinite number of 24 hour intervals (or intervals of any other duration) between '2011-01-24 0:00' and '2011-01-25 23:59:59'. But as the server's precision is one second, this boils down to 86,400 intervals:

First interval:  2011-01-24 0:00:00 -> 2011-01-25 0:00:00
Second interval: 2011-01-24 0:00:01 -> 2011-01-25 0:00:01
Third interval: 2011-01-24 0:00:02 -> 2011-01-25 0:00:02
   .
   .
   .
86400th interval: 2011-01-24 23:59:59 -> 2011-01-25 23:59:59

因此,我们可以使用循环遍历所有这些间隔,并计算每个用户和每个间隔的下载次数.当然,并非所有间隔都对我们有相同的兴趣,因此我们可以通过使用表中的时间戳作为间隔开始"来跳过其中的一些.

So we could use a loop to iterate over all these intervals and calculate the number of downloads per user and per interval. Of course, not all intervals are of the same interest to us, so we can skip some of them by using the timestamps in the table as "beginning of interval".

这是以下查询的作用.它使用表中的每个下载时间戳作为间隔开始",添加间隔持续时间,然后查询此间隔内每个用户的下载次数.

This is what the following query does. It uses every download timestamp in the table as "start of interval", adds the interval duration and then queries the number of downloads per user during this interval.

SET @duration = '24:00:00';
SET @limit = 5;
SELECT * FROM 
    (SELECT t1.user_id, 
            t1.dl_date startOfPeriod, 
            ADDTIME(t1.dl_date,@duration) endOfPeriod, 
           (SELECT COUNT(1) 
            FROM downloads t2 
            WHERE t1.user_id = t2.user_id 
            AND t1.dl_date <= t2.dl_date 
            AND ADDTIME(t1.dl_date,@duration) >= t2.dl_date) count
     FROM downloads t1) t3 
WHERE count > @limit;

结果如下:

+---------+---------------------+---------------------+-------+
| user_id | startOfPeriod       | endOfPeriod         | count |
+---------+---------------------+---------------------+-------+
|       1 | 2011-01-24 09:00:00 | 2011-01-25 09:00:00 |     6 |
|       1 | 2011-01-24 09:30:00 | 2011-01-25 09:30:00 |     7 |
|       1 | 2011-01-24 09:35:00 | 2011-01-25 09:35:00 |     6 |
|       1 | 2011-01-24 10:00:00 | 2011-01-25 10:00:00 |     6 |
|       2 | 2011-01-24 08:00:00 | 2011-01-25 08:00:00 |    13 |
|       2 | 2011-01-24 12:00:00 | 2011-01-25 12:00:00 |    12 |
|       2 | 2011-01-24 12:01:00 | 2011-01-25 12:01:00 |    10 |
|       2 | 2011-01-24 12:02:00 | 2011-01-25 12:02:00 |     9 |
|       2 | 2011-01-24 12:03:00 | 2011-01-25 12:03:00 |     8 |
|       2 | 2011-01-24 12:00:00 | 2011-01-25 12:00:00 |    12 |
|       2 | 2011-01-24 12:04:00 | 2011-01-25 12:04:00 |     7 |
|       2 | 2011-01-24 12:05:00 | 2011-01-25 12:05:00 |     6 |
+---------+---------------------+---------------------+-------+
12 rows in set (0.00 sec)

这篇关于mysql选择时间跨度之间的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆