MySQL没有从索引中选择正确的行数 [英] MySQL not picking correct row count from index
问题描述
我有一个下表
CREATE TABLE `test_series_analysis_data` (
`email` varchar(255) NOT NULL,
`mappingId` int(11) NOT NULL,
`packageId` varchar(255) NOT NULL,
`sectionName` varchar(255) NOT NULL,
`createdAt` datetime(3) DEFAULT NULL,
`marksObtained` float NOT NULL,
`updatedAt` datetime DEFAULT NULL,
`testMetaData` longtext,
PRIMARY KEY (`email`,`mappingId`,`packageId`,`sectionName`),
KEY `rank_index` (`mappingId`,`packageId`,`sectionName`,`marksObtained`),
KEY `mapping_package` (`mappingId`,`packageId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
以下是查询的解释输出:
Following is the output of the explain for the queries:
explain select rank
from (
select email, @i:=@i+1 as rank
from test_series_analysis_data ta
join (select @i:=0) va
where mappingId = ?1
and packageId = ?2
and sectionName = ?3
order by marksObtained desc
) as inter
where inter.email = ?4;
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 767 | const | 10 | 100.00 | NULL |
| 2 | DERIVED | <derived3> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | Using filesort |
| 2 | DERIVED | ta | NULL | ref | rank_index,mapping_package | rank_index | 4 | const | 20160 | 1.00 | Using where; Using index |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
查询优化器可以同时使用这两个索引,但 rank_index 是一个覆盖索引,所以它被选中了.令我惊讶的是以下查询的输出:
Query optimizer could have used both indexes but rank_index is a covering index so it got picked. What surprises me is the output of the following query:
explain select rank
from (
select email, @i:=@i+1 as rank
from test_series_analysis_data ta use index (mapping_package)
join (select @i:=0) va
where mappingId = ?1
and packageId = ?2
and sectionName = ?3
order by marksObtained desc
) as inter
where inter.email = ?4;
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 767 | const | 10 | 100.00 | NULL |
| 2 | DERIVED | <derived3> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | Using filesort |
| 2 | DERIVED | ta | NULL | ref | mapping_package | mapping_package | 4 | const | 19434 | 1.00 | Using index condition |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
当使用的索引是 mapping_package 时,为什么 rows
更少 (19434<20160).rank_index 可以更好地选择需要的内容,因此 rank_index 中的行数应该更少.
Why are there rows
lesser (19434<20160) when the index being used is mapping_package. rank_index can better select what is required so the row count should be lesser in rank_index.
这是否意味着对于给定查询,mapping_package 索引比 rank_index 好?
So does this mean mapping_package index is better than rank_index for the given query?
sectionName 是 varchar 是否有任何影响,因此两个索引应该提供相似的性能?
Does it have any effect that sectionName is a varchar so both indexes should give similar performance?
我还假设 Using index condition
只从索引中选择几行并扫描更多行.而如果 使用 where;使用索引
,优化器只需要读取索引而不是表来获取行,然后选择一些数据.那么为什么 Using where
在使用 rank_index 时会丢失?
Also I am assuming Using index condition
is selecting only few rows from index and scanning some more. While in case Using where; Using index
, optimizer has to only read the index and not table to get rows and then it is selecting some data. Then why Using where
is missing while using rank_index?
而且为什么mapping_package的key_len是4,索引只有两列?
Moreover why is the key_len for mapping_package is 4 when there are only two columns in the index?
感谢帮助.
推荐答案
(19434<20160)
-- 这两个数字都是估计值.他们离得这么近是不寻常的.我敢打赌,如果你做了ANALYZE TABLE
,两者都会改变,可能会改变不等式.
(19434<20160)
-- Both of those numbers are estimates. It is unusual for them to be that close. I'll bet if you did ANALYZE TABLE
, both would change, possibly changing the inequality.
注意别的:Using where;使用索引
vs 使用索引条件
.
但首先,让我提醒您,在 InnoDB 中,PRIMARY KEY
列附加到辅助键上.所以,实际上你有
But first, let me remind you that, in InnoDB, the PRIMARY KEY
columns are tacked onto the secondary key. So, effectively you have
KEY `rank_index` (`mappingId`,`packageId`,`sectionName`,`marksObtained`,`email`)
KEY `mapping_package` (`mappingId`,`packageId`,`email`,`sectionName`)
现在让我们决定最佳索引应该是什么:其中 mappingId = ?1和 packageId = ?2和部分名称 = ?3按标记排序获得的描述
Now let's decide what the optimal index should be: where mappingId = ?1 and packageId = ?2 and sectionName = ?3 order by marksObtained desc
- 首先,
WHERE
的=
部分:mappingId
、packageId
、sectionName
,以任何顺序; - 然后是
ORDER BY
列:marksObtained
- 奖励:最后,如果
email
(SELECT
中唯一提到的其他列)在键中,它将是Covering".
- First, the
=
parts ofWHERE
:mappingId
,packageId
,sectionName
, in any order; - Then the
ORDER BY
column(s):marksObtained
- Bonus: Finally if
email
(the only other column mentioned anywhere in theSELECT
) is in the key, it will be "Covering".
这说明 rank_index
是完美的",而其他索引则不太好.唉,EXPLAIN
并没有明确说明这一点.
This says that rank_index
is "perfect", and the other index is not so good. Alas, EXPLAIN
does not clearly say that.
你也可以想到这一点——你所需要的只是研究我的博客:http://mysql.rjweb.org/doc.php/index_cookbook_mysql(抱歉,时间不早了,我有点厚脸皮了.)
You, too, could have figured this out -- all you needed is to study my blog: http://mysql.rjweb.org/doc.php/index_cookbook_mysql (Sorry; it's getting late, and I am getting cheeky.)
其他提示:
- 不要盲目使用
(255)
.当需要 tmp 表时,这会使 tmp 表更大,因此效率较低.将限制降低到合理的程度.或者... - 如果这是一个巨大的表,你真的应该规范化"字符串,用 2 字节的
SMALLINT UNSIGNED
替换它们.这将通过其他方式提高性能,例如减少昂贵的 I/O.(好吧,20 行非常小,所以这可能不适用.)
- Don't blindly use
(255)
. When a tmp table is needed, this can make the the tmp table bigger, hence less efficient. Lower the limit to something reasonable. Or... - If this is a huge table, you really ought to 'normalize' the strings, replacing them with maybe a 2-byte
SMALLINT UNSIGNED
. This will improve performance in other ways, such as decreasing costly I/O. (OK, 20 rows is pretty small, so this may not apply.)
为什么 key_len
是 4?这意味着使用了一列,即 4 字节的 INT
mappingId
.我也希望它使用第二列.所以,我很难过.EXPLAIN FORMAT=JSON SELECT ...
可能会提供更多线索.
Why is key_len
4? That implies that one column was used, namely the 4-byte INT
mappingId
. I would have expected it to use the second column, too. So, I am stumped. EXPLAIN FORMAT=JSON SELECT ...
may provide more clues.
这篇关于MySQL没有从索引中选择正确的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!