为什么MySQL并不总是在这里使用索引合并? [英] Why does MySQL not always use index merge here?

查看:110
本文介绍了为什么MySQL并不总是在这里使用索引合并?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑下表:

CREATE TABLE `Alarms` (
  `AlarmId` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `DeviceId` BINARY(16) NOT NULL,
  `Code` BIGINT(20) UNSIGNED NOT NULL,
  `Ended` TINYINT(1) NOT NULL DEFAULT '0',
  `NaturalEnd` TINYINT(1) NOT NULL DEFAULT '0',
  `Pinned` TINYINT(1) NOT NULL DEFAULT '0',
  `Acknowledged` TINYINT(1) NOT NULL DEFAULT '0',
  `StartedAt` TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
  `EndedAt` TIMESTAMP NULL DEFAULT NULL,
  `MarkedForDeletion` TINYINT(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`AlarmId`),
  KEY `Key1` (`Ended`,`Acknowledged`),
  KEY `Key2` (`Pinned`),
  KEY `Key3` (`DeviceId`,`Pinned`),
  KEY `Key4` (`DeviceId`,`StartedAt`,`EndedAt`),
  KEY `Key5` (`DeviceId`,`Ended`,`EndedAt`),
  KEY `Key6` (`MarkedForDeletion`)
) ENGINE=INNODB;

并且,对于此测试,请按如下方式填充:

And, for this test, populate it like so:

-- Populate some dummy data; 500 alarms for each
-- of 1000 one-second periods
SET @testDevice = UNHEX('00030000000000000000000000000000');

DROP PROCEDURE IF EXISTS `injectAlarms`;
DELIMITER ;;
CREATE PROCEDURE injectAlarms()
BEGIN
    SET @fromdate  = '2018-02-18 00:00:00';
    SET @numdates  = 1000;
    SET @todate    = DATE_ADD(@fromdate, INTERVAL @numdates SECOND);

    -- Create table of alarm codes to join on
    DROP TABLE IF EXISTS `__codes`;
    CREATE TEMPORARY TABLE `__codes` (
        `Code` BIGINT NOT NULL PRIMARY KEY
    );

    SET @startcode = 0;
    SET @endcode   = 499;

    REPEAT
       INSERT INTO `__codes` VALUES(@startcode);
       SET @startcode = @startcode + 1;
    UNTIL @startcode > @endcode END REPEAT;

    -- Add an alarm for each code, for each second in range
    REPEAT
        INSERT INTO `Alarms`
            (`DeviceId`, `Code`, `Ended`, `NaturalEnd`, `Pinned`, `Acknowledged`, `StartedAt`, `EndedAt`)
            SELECT
                @testDevice,
                `Code`,
                TRUE, FALSE, FALSE, FALSE,
                @fromdate, @fromdate
            FROM `__codes`;

        SET @fromdate = DATE_ADD(@fromdate, INTERVAL 1 SECOND);
    UNTIL @fromdate > @todate END REPEAT;
END;;
DELIMITER ;

CALL injectAlarms();

现在,对于某些数据集,以下查询效果非常好:

Now, for some datasets the following query works quite well:

SELECT * FROM `Alarms`
WHERE
   ((`Alarms`.`Ended` = FALSE AND `Alarms`.`Acknowledged` = FALSE) OR `Alarms`.`Pinned` = TRUE) AND
   `MarkedForDeletion` = FALSE AND
   `DeviceId` = @testDevice
;

这是因为MariaDB足够聪明,可以使用索引合并,例如:

This is because MariaDB is clever enough to use index merges, e.g.:

id    select_type    table    type         possible_keys                 
1     SIMPLE         Alarms   index_merge  Key1,Key2,Key3,Key4,Key5,Key6 

key             key_len  ref     rows     Extra
Key1,Key2,Key3  2,1,17   (NULL)  2        Using union(Key1,intersect(Key2,Key3)); Using where

然而,如果我使用上面的过程填充的数据集,并翻转查询bit(这是我需要的另一个视图,但在这种情况下将返回更多行):

However if I use the dataset as populated by the procedure above, and flip the query around a bit (which is another view I need, but in this case will return many more rows):

SELECT * FROM `Alarms`
WHERE
  ((`Alarms`.`Ended` = TRUE OR `Alarms`.`Acknowledged` = TRUE) AND `Alarms`.`Pinned` = FALSE) AND
   `MarkedForDeletion` = FALSE AND
   `DeviceId` = @testDevice
;

…它没有:

… it doesn't:

id    select_type    table    type   possible_keys
1     SIMPLE         Alarms   ref    Key1,Key2,Key3,Key4,Key5,Key6

key   key_len  ref     rows     Extra
Key2  1        const  144706    Using where

我更希望索引合并更频繁地发生。实际上,给定 ref = const ,这个查询计划看起来并不太可怕…但是,查询需要几秒钟才能运行。这本身并不是世界末日,但是我的设计表现不佳,在尝试更具异国情调的查询时显示,这需要非常很长时间:

I would rather like the index merges to happen more often. As it is, given the ref=const, this query plan doesn't look too scary … however, the query takes almost a second to run. That in itself isn't the end of the world, but the poorly-scaling nature of my design shows when trying a more exotic query, which takes a very long time:

-- Create a temporary table that we'll join against in a mo
DROP TABLE IF EXISTS `_ranges`;
CREATE TEMPORARY TABLE `_ranges` (
    `Start` TIMESTAMP NOT NULL DEFAULT 0,
    `End`   TIMESTAMP NOT NULL DEFAULT 0,
    PRIMARY KEY(`Start`, `End`)
);

-- Populate it (in reality this is performed by my application layer)
SET @endtime = 1518992216;
SET @starttime = @endtime - 86400;
SET @inter = 900;
DROP PROCEDURE IF EXISTS `populateRanges`;
DELIMITER ;;
CREATE PROCEDURE populateRanges()
BEGIN
REPEAT
    INSERT IGNORE INTO `_ranges` VALUES(FROM_UNIXTIME(@starttime),FROM_UNIXTIME(@starttime + @inter));
    SET @starttime = @starttime + @inter;
UNTIL @starttime > @endtime END REPEAT;
END;;
DELIMITER ;
CALL populateRanges();

-- Actual query
SELECT UNIX_TIMESTAMP(`_ranges`.`Start`) AS `Start_TS`,
COUNT(`Alarms`.`AlarmId`) AS `n`
FROM `_ranges`
LEFT JOIN `Alarms`
ON `Alarms`.`StartedAt` < `_ranges`.`End`
  AND (`Alarms`.`EndedAt` IS NULL OR `Alarms`.`EndedAt` >= `_ranges`.`Start`)

  AND ((`Alarms`.`EndedAt` IS NULL AND `Alarms`.`Acknowledged` = FALSE) OR `Alarms`.`Pinned` = TRUE)
-- Again, the above condition is sometimes replaced by:
-- AND ((`Alarms`.`EndedAt` IS NOT NULL OR `Alarms`.`Acknowledged` = TRUE) AND `Alarms`.`Pinned` = FALSE)

 AND `DeviceId` = @testDevice
 AND `MarkedForDeletion` = FALSE
 GROUP BY `_ranges`.`Start`

(此查询应该收集每个时间片的计数列表,每个计数表示[StartingAt,EndedAt]范围与该时间片相交的警报数量。结果填充了一个折线图。)

(This query is supposed to gather a list of counts per time slice, each count indicating how many alarms' [StartedAt,EndedAt] range intersects that time slice. The result populates a line graph.)

同样,当我设计这些表并且它们中没有很多行时,索引合并似乎使一切都好转。但现在不是这样的:使用 injectAlarms()中给出的数据集,这需要 40秒才能完成!

Again, when I designed these tables and there weren't many rows in them, index merges seemed to make everything whiz along. But now not so: with the dataset as given in injectAlarms(), this takes 40 seconds to complete!

我在添加 MarkedForDeletion 列并执行我的第一次大型数据集比例测试时注意到了这一点。这就是为什么我选择索引并没有因为 MarkedForDeletion 的存在而造成太大影响,尽管如果我删除 AND MarkedForDeletion = FALSE 来自我的查询;但是,我保持这种状态,最终我需要它在那里。

I noticed this when adding the MarkedForDeletion column and performing some of my first large-dataset scale tests. This is why my choice of indexes doesn't make a big deal out of the presence of MarkedForDeletion, though the results described above are the same if I remove AND MarkedForDeletion = FALSE from my queries; however, I've kept the condition in, as ultimately I will need it to be there.

我尝试了一些 USE INDEX / FORCE INDEX 组合,但结果似乎永远不会使用索引合并。

I've tried a few USE INDEX/FORCE INDEX combinations, but it never seems to use index merge as a result.

我可以定义哪些索引使这个表在给定的情况下表现得很快?或者我如何重构我的查询以实现相同的目标?

以上在MariaDB 5.5.56 / CentOS 7上获得的查询计划,但是解决方案也必须适用于MySQL 5.1.73 / CentOS 6.。

推荐答案

哇!这是我见过的最复杂的索引合并。

Wow! That's the most complicated "index merge" I have seen.

通常(也许总是),你可以制作一个'复合'索引来代替索引合并交叉,并执行更好的。将 key2 (固定)更改为(固定,DeviceId)。这个可以摆脱'相交'并加快它的速度。

Usually (perhaps always), you can make a 'composite' index to replace an index-merge-intersect, and perform better. Change key2 from just (pinned) to (pinned, DeviceId). This may get rid of the 'intersect' and speed it up.

一般来说,优化器只在绝望中使用索引合并。 (我认为这是标题问题的答案。)对查询或所涉及的值进行任何细微更改,优化程序将执行查询而不进行索引合并。

In general, the Optimizer uses index merge only in desperation. (I think this is the answer to the title question.) Any slight changes to the query or the values involved, and the Optimizer will perform the query without index merge.

临时表 __代码的改进是构建具有大范围值的永久表,然后使用Proc中该表的一系列值。如果您使用的是MariaDB,则使用动态构建的序列表。例如,'table' seq_1_to_100 有效地一列,数字为1..100。无需声明或填充它。

An improvement on the temp table __codes is to build a permanent table with a large range of values, then use a range of values from that table inside your Proc. If you are using MariaDB, then use the dynamically built "sequence" table. For example the 'table' seq_1_to_100 is effectively a table of one column with numbers 1..100. No need to declare it or populate it.

您可以通过摆脱其他 REPEAT 循环计算时间代码

避免 LOOPs 将是最大的性能优势。

Avoiding LOOPs will be the biggest performance benefit.

完成所有工作,然后我可能会有其他提示。

Get all that done, then I may have other tips.

这篇关于为什么MySQL并不总是在这里使用索引合并?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆