我应该如何为具有两个范围条件的查询建立索引? [英] How should I go about indexing for a query with two range conditions?

查看:1012
本文介绍了我应该如何为具有两个范围条件的查询建立索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

系列中的下一个…

Next in the series…

CREATE TABLE `Alarms` (
  `AlarmId` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `DeviceId` BINARY(16) NOT NULL,
  `Code` BIGINT(20) UNSIGNED NOT NULL,
  `Ended` TINYINT(1) NOT NULL DEFAULT '0',
  `NaturalEnd` TINYINT(1) NOT NULL DEFAULT '0',
  `Pinned` TINYINT(1) NOT NULL DEFAULT '0',
  `Acknowledged` TINYINT(1) NOT NULL DEFAULT '0',
  `StartedAt` TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
  `EndedAt` TIMESTAMP NULL DEFAULT NULL,
  `MarkedForDeletion` TINYINT(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`AlarmId`),
  KEY `Key1` (`Ended`,`Acknowledged`),
  KEY `Key2` (`Pinned`),
  KEY `Key3` (`DeviceId`,`Pinned`),
  KEY `Key4` (`DeviceId`,`StartedAt`,`EndedAt`),
  KEY `Key5` (`DeviceId`,`Ended`,`EndedAt`),
  KEY `Key6` (`MarkedForDeletion`,`DeviceId`,`Acknowledged`,`Ended`,`StartedAt`),
  KEY `Key7` (`MarkedForDeletion`,`DeviceId`,`Ended`,`Pinned`,`EndedAt`)
) ENGINE=INNODB;

当选择时间戳以匹配许多行时,此查询非常慢:

This query is quite slow when the timestamps are chosen such that many rows are matched:

SELECT (((UNIX_TIMESTAMP(`StartedAt`)) DIV 900) * 900) AS `Period_TS`,
COUNT(`AlarmId`) AS `n`
FROM `Alarms`
WHERE `StartedAt` >= FROM_UNIXTIME(1518990000)  
AND `StartedAt` <  FROM_UNIXTIME(1518998400) 
AND `DeviceId` IN (
    UNHEX('00030000000000000000000000000000'),
    UNHEX('000300000000000000000000000181cd'),
    UNHEX('000300000000000000000000000e7cf6'),
    UNHEX('000300000000000000000000000e7cf7'),
    UNHEX('000300000000000000000000000f423f')
) AND `MarkedForDeletion` = FALSE
GROUP BY `Period_TS` ASC;

我认为这是因为我在两个字段上混合范围条件( DeviceId StartedAt )。

I believe that this because I am mixing range conditions on two fields (DeviceId and StartedAt).

如果这是真的,我该怎么做才能解决问题?也许是触发使用索引合并的东西?

If that's true, what can I do to resolve the problem? Perhaps something to trigger use of an index merge?

推荐答案

IN 介于 = 和'范围'之间。所以,我对问题的标题嗤之以鼻。两个范围几乎不可能优化;一个 IN 加上一个范围有一定的优化机会。

IN is sort of between = and a 'range'. So, I quibble with the title on the Question. Two ranges is virtually impossible to optimize; an IN plus a range has some chance of optimization.

基于

WHERE `StartedAt` >= FROM_UNIXTIME(1518990000)  
AND   `StartedAt` <  FROM_UNIXTIME(1518998400) 
AND `DeviceId` IN (
    UNHEX('00030000000000000000000000000000'),
    UNHEX('000300000000000000000000000181cd'),
    UNHEX('000300000000000000000000000e7cf6'),
    UNHEX('000300000000000000000000000e7cf7'),
    UNHEX('000300000000000000000000000f423f')
) AND `MarkedForDeletion` = FALSE

我会提供2个索引并让Optimizer决定使用哪个:

I would provide 2 indexes and let the Optimizer decide which to use:

INDEX(MarkedForDeletion, StartedAt, DeviceId)
INDEX(MarkedForDeletion, DeviceId, StartedAt)

一些较新版本的MySQL / MariaDB可以跳过并使用 second 索引中的所有3列。在所有版本中,任一索引的前两列都使其成为候选者。选择可能由统计数据驱动,并且可能(或可能不)是正确的选择。

Some newer versions of MySQL/MariaDB can leapfrog and make use of all 3 columns in the second index. In all versions the first 2 columns of either index makes it a candidate. The choice may be driven by statistics, and may (or may not) be the 'right' choice.

因为 AlarmId 不能 NULL ,使用模式: COUNT(*)

Since AlarmId cannot be NULL, use the pattern: COUNT(*).

进行更改后,我的每个索引都是覆盖,从而进一步提升了性能。

After making that change, each of my indexes is "covering", thereby giving an extra boost in performance.

这篇关于我应该如何为具有两个范围条件的查询建立索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆