MySQL根据ORDER BY查询的限制值使用不同的索引 [英] MySQL using different index depending on limit value with ORDER BY query
问题描述
这对我来说很奇怪:一张表 'ACTIVITIES',在 ACTIVITY_DATE 有一个索引.具有不同 LIMIT 值的完全相同的查询会导致不同的执行计划.
This is weird to me: One table 'ACTIVITIES' with one index on ACTIVITY_DATE. The exact same query with different LIMIT value results in different execution plan.
这是:
mysql> explain select * from ACTIVITIES order by ACTIVITY_DATE desc limit 20
-> ;
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
| 1 | SIMPLE | ACTIVITIES | index | NULL | ACTI_DATE_I | 4 | NULL | 20 | |
+----+-------------+------------+-------+---------------+-------------+---------+------+------+-------+
1 row in set (0.00 sec)
mysql> explain select * from ACTIVITIES order by ACTIVITY_DATE desc limit 150
-> ;
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
| 1 | SIMPLE | ACTIVITIES | ALL | NULL | NULL | NULL | NULL | 10629 | Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+-------+----------------+
1 row in set (0.00 sec)
为什么当我限制 150 时它不使用索引?我的意思是,扫描 150 行似乎比扫描 10629 行快,对吗?
How come when I limit 150 it is not using the index? I mean, scanning 150 lines seems faster than scanning 10629 rows, right?
编辑
查询使用索引直到limit 96"并在limit 97"开始文件排序.该表没有任何具体内容,甚至没有外键,这是完整的创建表:
The query uses the index till "limit 96" and starts filesort at "limit 97". The table has nothing specific, even not a foreign key, here is the complete create table:
mysql> show create table ACTIVITIES\G
*************************** 1. row ***************************
Table: ACTIVITIES
Create Table: CREATE TABLE `ACTIVITIES` (
`ACTIVITY_ID` int(11) NOT NULL AUTO_INCREMENT,
`ACTIVITY_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`USER_KEY` varchar(50) NOT NULL,
`ITEM_KEY` varchar(50) NOT NULL,
`ACTIVITY_TYPE` varchar(1) NOT NULL,
`EXTRA` varchar(500) DEFAULT NULL,
`IS_VISIBLE` varchar(1) NOT NULL DEFAULT 'Y',
PRIMARY KEY (`ACTIVITY_ID`),
KEY `ACTI_USER_I` (`USER_KEY`,`ACTIVITY_DATE`),
KEY `ACTIVITY_ITEM_I` (`ITEM_KEY`,`ACTIVITY_DATE`),
KEY `ACTI_ITEM_TYPE_I` (`ITEM_KEY`,`ACTIVITY_TYPE`,`ACTIVITY_DATE`),
KEY `ACTI_DATE_I` (`ACTIVITY_DATE`)
) ENGINE=InnoDB AUTO_INCREMENT=10091 DEFAULT CHARSET=utf8 COMMENT='Logs activity'
1 row in set (0.00 sec)
mysql>
我也尝试运行ANALYSE TABLE ACTIVITIES",但这并没有改变任何事情.
I also tried to run "ANALYSE TABLE ACTIVITIES" but that did not change a thing.
推荐答案
事情就是这样.等我一分钟...
That's the way things go. Bear with me a minute...
优化器希望使用 INDEX,在本例中为 ACTI_DATE_I.但它不想使用它,如果那样会更慢.
The Optimizer would like to use an INDEX, in this case ACTI_DATE_I. But it does not want to use it if that would be slower.
方案 A:使用索引.
- 最后到达 BTree 结构的索引(因为 DESC)
- 向后扫描
- 对于索引中的每一行,在数据中查找对应的行.注意:索引有 (ACTIVITY_DATE, ACTIVITY_ID) 因为 PRIMARY KEY 隐式附加到任何辅助键.使用 PK (ACTIVITY_ID) 访问数据"是另一个 BTree 查找,可能是随机的.因此,它可能很慢.(但在你的情况下不是很慢.)
- 这在 LIMIT 行之后停止.
计划 B:忽略表格
- 扫描表,构建一个 tmp 表.(可能在内存中.)
- 对tmp表进行排序
- 剥离 LIMIT 行.
在您的情况下(96 - 10K 的 1%),它选择了表扫描是令人惊讶的.通常,截止值大约是表中行数的 10%-30%.
In your case (96 -- 1% of 10K) it is surprising that it picked the table scan. Normally, the cutoff is somewhere around 10%-30% of the number of rows in the table.
ANALYZE TABLE
应该重新计算统计数据,这可以说服它与另一个计划一起使用.
ANALYZE TABLE
should have caused a recalculation of the statistics, which could have convinced it to go with the other Plan.
您使用的是什么版本的 MySQL?(不,我不知道这方面有任何变化.)
What version of MySQL are you using? (No, I don't know of any changes in this area.)
您可以尝试一件事:OPTIMIZE TABLE ACTIVITIES;
这将重建表,从而重新打包块并导致可能不同的统计数据.如果这有帮助,我想知道——因为我通常说优化表没用".
One thing you could try: OPTIMIZE TABLE ACTIVITIES;
That will rebuild the table, thereby repacking the blocks and leading to potentially different statistics. If that helps, I would like to know it -- since I normally say "Optimize table is useless".
这篇关于MySQL根据ORDER BY查询的限制值使用不同的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!