时间间隔数据的MySQL查询直方图 [英] MySql query histogram for time intervals data

查看:300
本文介绍了时间间隔数据的MySQL查询直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这种类型的事件输入

I have an event input of this type

event user
event start
event end
event type

插入到MySql表中,每个表都在其自己的行中,并以user + start作为主键.

Inserted to MySql table, each in its own row with user+start as primary key.

我需要按时间间隔(例如分钟)查询直方图的类型,以对每个时间间隔上发生的事件进行计数. 像这样:

I need to query an histogram for a type by time interval (say minute) counting events occurred on each time interval. something like:

SELECT count(*) as hits FROM events 
WHERE type="browsing" 
GROUP BY time_diff("2015-1-1" AND "2015-1-2") / 60 * second

但是除了编写代码之外,我找不到在SQL中实现此目的的任何方法吗?

but I could not find any way to do that in SQL besides writing code, any idea?

样本数据

user, start, end, type
1, 2015-1-1 12:00:00, 2015-1-1 12:03:59, browsing
2, 2015-1-1 12:03:00, 2015-1-1 12:06:00, browsing
2, 2015-1-1 12:03:00, 2015-1-1 12:06:00, eating
3, 2015-1-1 12:03:00, 2015-1-1 12:08:00, browsing

结果应如下所示:

         ^
count    |
browsing |
users    |       *
         |       *  *  *  *
         | *  *  *  *  *  *  *  *
         --|--|--|--|--|--|--|--|--|--> minute
         0  1  2  3  4  5  6  7  8  9 

推荐答案

您可以使用group by和所需级别进行此操作.这是使用您提供的数据的示例:

You can do this using group by with the level that you want. Here is an example using the data you gave:

首先使用SQL创建表并填充它.这里的ID列不是必需的",但是如果表很大或上面有索引,则建议使用.

First the SQL to create the table and populate it. The ID column here isn't "needed" but it is recommended if the table will be large or have indexes on it.

CREATE TABLE `test`.`events` (
  `id` INT NOT NULL AUTO_INCREMENT,
  `user` INT NULL,
  `start` DATETIME NULL,
  `end` DATETIME NULL,
  `type` VARCHAR(45) NULL,
  PRIMARY KEY (`id`));

INSERT INTO events (user, start, end, type) VALUES 
(1, '2015-1-1 12:00:00', '2015-1-1 12:03:59', 'browsing'),
(2, '2015-1-1 12:03:00', '2015-1-1 12:06:00', 'browsing'),
(2, '2015-1-1 12:03:00', '2015-1-1 12:06:00', 'eating'),
(3, '2015-1-1 12:03:00', '2015-1-1 12:08:00', 'browsing');

要获取分钟持续时间与事件数之间的有序对列表,请执行以下操作:

然后可以使用timestampdiff函数轻松地编写查询,如下所示:

The query can then be easily written using the timestampdiff fuction, as shown below:

SELECT 
    TIMESTAMPDIFF(MINUTE, start, end) as minutes,
    COUNT(*) AS numEvents
FROM
    test.events
GROUP BY TIMESTAMPDIFF(MINUTE, start, end)

输出:

minutes      numEvents
3            3
5            1

选择中的第一个参数可以是FRAC_SECOND,SECOND,MINUTE,HOUR,DAY,WEEK,MONTH,QUARTER或YEAR之一.

The first parameter in the select can be one of FRAC_SECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, or YEAR.

以下是您可以执行的更多查询示例:

Here are some more examples of queries you can do:

按小时显示的事件(已应用地板功能)

SELECT 
    TIMESTAMPDIFF(HOUR, start, end) as hours,
    COUNT(*) AS numEvents
FROM
    test.events
GROUP BY TIMESTAMPDIFF(HOUR, start, end)

**按小时显示的事件具有更好的格式**

**Events by hour with better formatting **

SELECT 
    CONCAT("<", TIMESTAMPDIFF(HOUR, start, end) + 1) as hours,
    COUNT(*) AS numEvents
FROM
    test.events
GROUP BY TIMESTAMPDIFF(HOUR, start, end)

您可以按各种选项分组,但这绝对可以帮助您入门.大多数绘图软件包都允许您指定任意的x y坐标,因此您不必担心x轴上缺少值.

You can group by a variety of options, but this should definitely get you started. Most plotting packages will allow you to specify arbitrary x y coordinates, so you don't need to worry about the missing values on the x axis.

要获取特定时间(用于记录)的事件数量的有序对列表: 请注意,这留作参考.

To get a list of ordered pairs of number of events at a specific time (for logging): Note that this is left for reference.

现在进行查询.首先,您必须选择要用于分组的项目.例如,一个任务可能需要一分钟以上,因此开始和结束将在不同的分钟内.对于所有这些示例,我将它们作为开始时间的依据,因为那是事件实际发生的时间.

Now for the queries. First you have to pick which item you want to use for the grouping. For example, a task might take more than a minute, so the start and end would be in different minutes. For all these examples, I am basing them off of the start time, since that is when the event actually took place.

要按分钟对事件计数进行分组,可以使用如下查询:

To group event counts by minute, you can use a query like this:

SELECT 
     DATE_FORMAT(start, '%M %e, %Y %h:%i %p') as minute, 
     count(*) AS numEvents 
FROM test.events 
GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start), HOUR(start), MINUTE(start);

请注意,这是如何从年份开始按分钟将所有项目分组的.我还将分钟显示为标签.结果输出如下:

Note how this groups by all the items, starting with year, going the minute. I also have the minute displayed as a label. The resulting output looks like this:

minute                      numEvents
January 1, 2015 12:00 PM    1
January 1, 2015 12:03 PM    3

您可以使用php采集这些数据,并准备好通过其中的众多图形库之一显示这些数据,在x轴上绘制分钟列,在y轴上绘制numEvents.

This is data that you could then take using php and prepare it for display by one of the many graphing libraries out there, plotting the minute column on the x axis, and plotting the numEvents on the y axis.

以下是您可以执行的更多查询示例:

Here are some more examples of queries you can do:

按小时显示的事件

SELECT 
     DATE_FORMAT(start, '%M %e, %Y %h %p') as hour, 
     count(*) AS numEvents 
FROM test.events 
GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start), HOUR(start);

按日期显示的事件

SELECT 
    DATE_FORMAT(start, '%M %e, %Y') as date, 
    count(*) AS numEvents 
FROM test.events 
GROUP BY YEAR(start), MONTH(start), DAYOFMONTH(start);

按月显示的事件

SELECT 
    DATE_FORMAT(start, '%M %Y') as date, 
    count(*) AS numEvents 
FROM test.events 
GROUP BY YEAR(start), MONTH(start);

按年份显示的事件

SELECT 
    DATE_FORMAT(start, '%Y') as date, 
    count(*) AS numEvents 
FROM test.events 
GROUP BY YEAR(start);

我还应该指出,如果您在此表的开始列上有索引,那么即使有亿万行,这些查询也将快速完成.

I should also point out that if you have an index on the start column for this table, these queries will complete quickly, even with hundreds of millions of rows.

希望这会有所帮助!如果您还有其他疑问,请告诉我.

Hope this helps! Let me know if you have any other questions about this.

这篇关于时间间隔数据的MySQL查询直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆