计数事件表中的行,按时间范围分组,很多 [英] Counting rows in event table, grouped by time range, a lot

查看:130
本文介绍了计数事件表中的行,按时间范围分组,很多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



  CREATE TABLE`Alarms`(
`AlarmId` INT UNSIGNED NOT NULL AUTO_INCREMENT
COMMENT32-bit ID,

`Ended` BOOLEAN NOT NULL DEFAULT FALSE
COMMENT报警是否结束,

`StartedAt` TIMESTAMP NOT NULL DEFAULT 0
COMMENT发生报警的时间,

`EndedAt` TIMESTAMP NULL
COMMENT报警结束的时间(NULL iff Ended = false),

PRIMARY KEY(`AlarmId`),

KEY`Key4`(`StartedAt`),
KEY`Key5 `(`Ended`,`EndedAt`)
)ENGINE = InnoDB;

现在,对于图形用户界面,我想制作:




  • 每天至少有一个警报处于活动状态的列表

  • ,开始发生多少警报$每天b $ b
  • ,结束了多少警报


目的是为用户提供一个下拉框,可以选择一个日期以查看当天发生的任何警报(在此日期之前或之中以及在此期间或之后开始)。所以像这样:

  + -------------------- --------------- + 
|选择一天▼|
+ ----------------------------------- +
| 2017-12-03(3开始)|
| 2017-12-04(1开始,2结束)|
| 2017-12-05(2结束)|
| 2017-12-16(1开始,1结束)|
| 2017-12-17(1开始)|
| 2017-12-18 |
| 2017-12-19 |
| 2017-12-20 |
| 2017-12-21(1结束)|
+ ----------------------------------- +

我可能会强制设置警报的年龄限制,以便在一年之后将其归档/删除。所以这就是我们正在处理的规模。



我预计每天的任何地方都会有从零到数万的警报。



我的第一个想法是相当简单的:

 
SELECT
COUNT(`AlarmId ``AS`NumStarted`,
NULL AS`NumEnded`,
DATE(`StartedAt`)AS`Date`
FROM`Alarms`
GROUP BY`Date`

UNION

SELECT
NULL作为`NumStarted`,
COUNT(`AlarmId`)AS`NumEnded`,
DATE(` EndedAt`)AS`Date`
FROM`Alarms`
WHERE`Ended` = TRUE
GROUP BY`Date`
);

这使用了我的两个索引,连接类型为 ref 和ref类型 const ,我很满意。我可以迭代结果集,将找到的非 NULL 值转储到C ++ std :: map< boost :: gregorian :: date,std :: pair< size_t,size_t>>< / code>(然后填补空白,那些没有闹钟开始或结束,但是从前几天开始活动的日子)。



我投入工作的扳手是该列表应考虑基于位置的时区,但只有我的应用程序知道时区。由于逻辑上的原因,MySQL会话故意地是 SET time_zone ='+00:00',因此时间戳全部以UTC排除。 (其他各种工具然后用于对历史时区执行任何必要的特定于位置的更正,同时考虑到DST和其他什么)。对于应用程序的其余部分来说这很好,但对于此特定查询,它会打破日期<$ c $也许我可以预先计算(在我的应用程序中)一系列时间范围,并生成一个巨大的查询 2n UNION ed queries(其中 n =要检查的天数)并获取 NumStarted NumEnded 计数:

 <$ c假设所需时区为-05:00 
-
- 12月3日

SELECT
COUNT(`AlarmId`)AS` NumStarted`,
NULL AS'NumEnded`,
'2017-12-03'AS`Date`
FROM`Alarms`
- 12月3日UTC-5开始报警
WHERE`StartedAt`> ='2017-12-02 19:00:00'
AND`StartedAt`<'2017-12-03 19:00:00'
GROUP BY`日期`

UNION

SELECT
NULL作为`NumStarted`,
COUNT(`AlarmId`)AS`NumEnded `,
'2017-12-03'AS'Date'
FROM`Alarms`
- 12月3日结束的警报UTC-5
其中`EndedAt`> = '2017-12-02 19:00:00'
AND`EndedAt`< '2017-12-03 19:00:00'
GROUP BY`日期'

联合

- 12月4日

SELECT
COUNT(`AlarmId`)AS`NumStarted`,
NULL AS`NumEnded`,
'2017-12-04'AS`Date`
FROM`Alarms `
- 12月4日开始的警报UTC-5
WHERE`StartedAt`> ='2017-12-03 19:00:00'
AND`StartedAt`<'2017 -12-04 19:00:00'
GROUP BY`日期'

UNION

SELECT
NULL'NumStarted`,
COUNT(`AlarmId`)AS`NumEnded`,
'2017-12-04'AS`Date`
FROM`Alarms`
- 12月4日期间结束的警报UTC- 5
其中`EndedAt`> ='2017-12-03 19:00:00'
AND`EndedAt`<'2017-12-04 19:00:00'
GROUP BY`日期'

联合

- 12月5日
- [..]

但是,即使我将数据库限制为一年的价值历史报警,最高达730 UNION d SELECT s。我的spidey感觉告诉我,这是一个非常糟糕的主意。



我还能怎样产生这些类型的时间分组统计?或者这真的很愚蠢,我应该考虑解决阻止我使用MySQL的tzinfo 的问题?

5.1.73(CentOS 6)和MariaDB 5.5.50(CentOS 7)。

解决方案 UNION 方法实际上离可行解决方案不远;你可以通过招募一个临时表来达到同样的效果,而不会造成灾难性的大问题:

  CREATE TEMPORARY TABLE`_ranges`( 
`Start` TIMESTAMP NOT NULL DEFAULT 0,
`End` TIMESTAMP NOT NULL DEFAULT 0,
PRIMARY KEY(`Start`,`End`)
);

插入`_ranges` VALUES
- 12月3日UTC-5
('2017-12-02 19:00:00','2017-12-03 19 :00:00'),
- 12月4日UTC-5
('2017-12-03 19:00:00','2017-12-04 19:00:00'),
- 12月5日UTC-5
('2017-12-04 19:00:00','2017-12-05 19:00:00'),
- etc 。
;

- 现在需要的查询很简单,也很快:

SELECT
`_ranges`.`Start`,
COUNT(`AlarmId` )AS`NumStarted`
从`_ranges`左加入`Alarms`
ON`Alarms`.`StartedAt`> =`_ranges`.Start`
ON`Alarms`.` StartedAt`< `_ranges`.`End`
GROUP BY`_ranges`.`Start`;

SELECT
`_ranges`.`Start`,
COUNT(`AlarmId`)AS`NumEnded`
FROM`_ranges` LEFT JOIN`Alarms`
ON`Alarms`.`EndedAt`> =`_ranges`.`Start`
ON`Alarms`.`EndedAt`< `_ranges`.`End`
GROUP BY`_ranges`.`Start`;

DROP TABLE`_ranges`;

(此方法受到 a DBA.SE post。)



请注意,有两个 SELECT s—原始的 UNION 不再可能,因为在同一个查询中,临时表不能被访问两次。但是,由于我们已经引入了额外的语句( CREATE INSERT DROP ),这似乎是在这种情况下的一个模拟问题。



在这两种情况下,每行代表我们请求的期间之一,第一列等于期间的开始部分(以便我们可以在结果集中识别它)。

确保在代码中根据需要使用异常处理在例程返回之前确保 _ranges DROP 尽管临时表对于MySQL会话是本地的,但如果您之后继续使用该会话,那么您可能需要一个干净的状态,特别是如果该功能将再次使用。



如果这仍然过重,例如因为您有很多时间段,并且 CREATE TEMPORARY TABLE 本身将因此变得太大,或者因为多个语句不会不适合您的调用代码,或者因为您的用户没有创建和删除临时表的权限,您必须重新使用简单的 GROUP BY DAY(日期),并确保您的用户在系统的 tzdata 时运行 mysql_tzinfo_to_sql 更新。


Imagine I have a table like this:

CREATE TABLE `Alarms` (
    `AlarmId` INT UNSIGNED NOT NULL AUTO_INCREMENT
        COMMENT "32-bit ID",

    `Ended` BOOLEAN NOT NULL DEFAULT FALSE
        COMMENT "Whether the alarm has ended",

    `StartedAt` TIMESTAMP NOT NULL DEFAULT 0
        COMMENT "Time at which the alarm was raised",

    `EndedAt` TIMESTAMP NULL
        COMMENT "Time at which the alarm ended (NULL iff Ended=false)",

    PRIMARY KEY (`AlarmId`),

    KEY `Key4` (`StartedAt`),
    KEY `Key5` (`Ended`, `EndedAt`)
) ENGINE=InnoDB;

Now, for a GUI, I want to produce:

  • a list of days during which at least one alarm were "active"
  • for each day, how many alarms started
  • for each day, how many alarms ended

The intent is to present users with a dropdown box from which they can choose a date to see any alarms active (started before or during, and ended during or after) on that day. So something like this:

+-----------------------------------+
| Choose day                      ▼ |
+-----------------------------------+
|   2017-12-03 (3 started)          |
|   2017-12-04 (1 started, 2 ended) |
|   2017-12-05 (2 ended)            |
|   2017-12-16 (1 started, 1 ended) |
|   2017-12-17 (1 started)          |
|   2017-12-18                      |
|   2017-12-19                      |
|   2017-12-20                      |
|   2017-12-21 (1 ended)            |
+-----------------------------------+

I will probably force an age limit on alarms so that they are archived/removed after, say, a year. So that's the scale we're working with.

I expect anywhere from zero to tens of thousands of alarms per day.

My first thought was a reasonably simple:

(
    SELECT
        COUNT(`AlarmId`) AS `NumStarted`,
        NULL AS `NumEnded`,
        DATE(`StartedAt`) AS `Date`
    FROM `Alarms`
    GROUP BY `Date`
)
UNION
(
    SELECT
        NULL AS `NumStarted`,
        COUNT(`AlarmId`) AS `NumEnded`,
        DATE(`EndedAt`) AS `Date`
    FROM `Alarms`
    WHERE `Ended` = TRUE
    GROUP BY `Date`
);

This uses both of my indexes, with join type ref and ref type const, which I'm happy with. I can iterate over the resultset, dumping the non-NULL values found into a C++ std::map<boost::gregorian::date, std::pair<size_t, size_t>> (then "filling the gaps" for days on which no alarms started or ended, but were active from previous days).

The spanner I'm throwing in the works is that the list should take into account location-based timezones, but only my application knows about timezones. For logistical reasons, the MySQL session is deliberately SET time_zone = '+00:00' so that timestamps are all kicked out in UTC. (Various other tools are then used to perform any necessary location-specific corrections for historical timezones, taking into account DST and whatnot.) For the rest of the application this is great, but for this particular query it breaks the date GROUPing.

Maybe I could pre-calculate (in my application) a list of time ranges, and generate a huge query of 2n UNIONed queries (where n = number of "days" to check) and get the NumStarted and NumEnded counts that way:

-- Example assuming desired timezone is -05:00
-- 
-- 3rd December
(
    SELECT
        COUNT(`AlarmId`) AS `NumStarted`,
        NULL AS `NumEnded`,
        '2017-12-03' AS `Date`
    FROM `Alarms`
    -- Alarm started during 3rd December UTC-5
    WHERE `StartedAt` >= '2017-12-02 19:00:00'
      AND `StartedAt` <  '2017-12-03 19:00:00'
    GROUP BY `Date`
)
UNION
(
    SELECT
        NULL AS `NumStarted`,
        COUNT(`AlarmId`) AS `NumEnded`,
        '2017-12-03' AS `Date`
    FROM `Alarms`
    -- Alarm ended during 3rd December UTC-5
    WHERE `EndedAt` >= '2017-12-02 19:00:00'
      AND `EndedAt` <  '2017-12-03 19:00:00'
    GROUP BY `Date`
)
UNION

-- 4th December
(
    SELECT
        COUNT(`AlarmId`) AS `NumStarted`,
        NULL AS `NumEnded`,
        '2017-12-04' AS `Date`
    FROM `Alarms`
    -- Alarm started during 4th December UTC-5
    WHERE `StartedAt` >= '2017-12-03 19:00:00'
      AND `StartedAt` <  '2017-12-04 19:00:00'
    GROUP BY `Date`
)
UNION
(
    SELECT
        NULL AS `NumStarted`,
        COUNT(`AlarmId`) AS `NumEnded`,
        '2017-12-04' AS `Date`
    FROM `Alarms`
    -- Alarm ended during 4th December UTC-5
    WHERE `EndedAt` >= '2017-12-03 19:00:00'
      AND `EndedAt` <  '2017-12-04 19:00:00'
    GROUP BY `Date`
)
UNION

-- 5th December
-- [..]

But, of course, even if I'm restricting the database to a year's worth of historical alarms, that's up to like 730 UNIONd SELECTs. My spidey senses tell me that this is a very bad idea.

How else can I generate these sort of time-grouped statistics? Or is this really silly and I should look at resolving the problems preventing me from using tzinfo with MySQL?

Must work on MySQL 5.1.73 (CentOS 6) and MariaDB 5.5.50 (CentOS 7).

解决方案

The UNION approach is actually not far off a viable solution; you can achieve the same thing, without a catastrophically large query, by recruiting a temporary table:

CREATE TEMPORARY TABLE `_ranges` (
   `Start` TIMESTAMP NOT NULL DEFAULT 0,
   `End`   TIMESTAMP NOT NULL DEFAULT 0,
   PRIMARY KEY (`Start`, `End`)
);

INSERT INTO `_ranges` VALUES
   -- 3rd December UTC-5
   ('2017-12-02 19:00:00', '2017-12-03 19:00:00'),
   -- 4th December UTC-5
   ('2017-12-03 19:00:00', '2017-12-04 19:00:00'),
   -- 5th December UTC-5
   ('2017-12-04 19:00:00', '2017-12-05 19:00:00'),
   -- etc.
;

-- Now the queries needed are simple and also quick:

SELECT
   `_ranges`.`Start`,
   COUNT(`AlarmId`) AS `NumStarted`
FROM `_ranges` LEFT JOIN `Alarms`
  ON `Alarms`.`StartedAt` >= `_ranges`.`Start`
  ON `Alarms`.`StartedAt` <  `_ranges`.`End`
GROUP BY `_ranges`.`Start`;

SELECT
   `_ranges`.`Start`,
   COUNT(`AlarmId`) AS `NumEnded`
FROM `_ranges` LEFT JOIN `Alarms`
  ON `Alarms`.`EndedAt` >= `_ranges`.`Start`
  ON `Alarms`.`EndedAt` <  `_ranges`.`End`
GROUP BY `_ranges`.`Start`;

DROP TABLE `_ranges`;

(This approach was inspired by a DBA.SE post.)

Notice that there are two SELECTs — the original UNION is no longer possible, because temporary tables cannot be accessed twice in the same query. However, since we've already introduced additional statements anyway (the CREATE, INSERT and DROP), this seems to be a moot problem in the circumstances.

In both cases, each row represents one of our requested periods, and the first column equals the "start" part of the period (so that we can identify it in the resultset).

Be sure to use exception handling in your code as needed to ensure that _ranges is DROPped before your routine returns; although the temporary table is local to the MySQL session, if you're continuing to use that session afterwards then you probably want a clean state, particularly if this function is going to be used again.

If this is still too heavy, for example because you have many time periods and the CREATE TEMPORARY TABLE itself will therefore become too large, or because multiple statements doesn't fit in your calling code, or because your user doesn't have permission to create and drop temporary tables, you'll have to fall back on a simple GROUP BY over DAY(Date), and ensure that your users run mysql_tzinfo_to_sql whenever the system's tzdata is updated.

这篇关于计数事件表中的行,按时间范围分组,很多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆