MySQL没有从索引中选择正确的行数 [英] MySQL not picking correct row count from index

查看:65
本文介绍了MySQL没有从索引中选择正确的行数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个下表

CREATE TABLE `test_series_analysis_data` (
  `email` varchar(255) NOT NULL,
  `mappingId` int(11) NOT NULL,
  `packageId` varchar(255) NOT NULL,
  `sectionName` varchar(255) NOT NULL,
  `createdAt` datetime(3) DEFAULT NULL,
  `marksObtained` float NOT NULL,
  `updatedAt` datetime DEFAULT NULL,
  `testMetaData` longtext,
  PRIMARY KEY (`email`,`mappingId`,`packageId`,`sectionName`),
  KEY `rank_index` (`mappingId`,`packageId`,`sectionName`,`marksObtained`),
  KEY `mapping_package` (`mappingId`,`packageId`)
  ) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

以下是查询的解释输出:

Following is the output of the explain for the queries:

explain select rank 
from (
   select email, @i:=@i+1 as rank 
   from test_series_analysis_data ta 
   join (select @i:=0) va 
   where mappingId = ?1 
   and packageId = ?2 
   and sectionName = ?3 
   order by marksObtained desc
) as inter 
where inter.email = ?4;

+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
| id | select_type | table      | partitions | type   | possible_keys              | key         | key_len | ref   | rows  | filtered | Extra                    |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ref    | <auto_key0>                | <auto_key0> | 767     | const |    10 |   100.00 | NULL                     |
|  2 | DERIVED     | <derived3> | NULL       | system | NULL                       | NULL        | NULL    | NULL  |     1 |   100.00 | Using filesort           |
|  2 | DERIVED     | ta         | NULL       | ref    | rank_index,mapping_package | rank_index  | 4       | const | 20160 |     1.00 | Using where; Using index |
|  3 | DERIVED     | NULL       | NULL       | NULL   | NULL                       | NULL        | NULL    | NULL  |  NULL |     NULL | No tables used           |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+

查询优化器可以同时使用这两个索引,但 rank_index 是一个覆盖索引,所以它被选中了.令我惊讶的是以下查询的输出:

Query optimizer could have used both indexes but rank_index is a covering index so it got picked. What surprises me is the output of the following query:

explain select rank 
from ( 
  select email, @i:=@i+1 as rank 
  from test_series_analysis_data ta use index (mapping_package) 
  join (select @i:=0) va 
  where mappingId = ?1 
  and packageId = ?2 
  and sectionName = ?3 
  order by marksObtained desc
) as inter 
where inter.email = ?4;

+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
| id | select_type | table      | partitions | type   | possible_keys   | key             | key_len | ref   | rows  | filtered | Extra                 |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ref    | <auto_key0>     | <auto_key0>     | 767     | const |    10 |   100.00 | NULL                  |
|  2 | DERIVED     | <derived3> | NULL       | system | NULL            | NULL            | NULL    | NULL  |     1 |   100.00 | Using filesort        |
|  2 | DERIVED     | ta         | NULL       | ref    | mapping_package | mapping_package | 4       | const | 19434 |     1.00 | Using index condition |
|  3 | DERIVED     | NULL       | NULL       | NULL   | NULL            | NULL            | NULL    | NULL  |  NULL |     NULL | No tables used        |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+

当使用的索引是 mapping_package 时,为什么 rows 更少 (19434<20160).rank_index 可以更好地选择需要的内容,因此 rank_index 中的行数应该更少.

Why are there rows lesser (19434<20160) when the index being used is mapping_package. rank_index can better select what is required so the row count should be lesser in rank_index.

这是否意味着对于给定查询,mapping_package 索引比 rank_index 好?

So does this mean mapping_package index is better than rank_index for the given query?

sectionName 是 varchar 是否有任何影响,因此两个索引应该提供相似的性能?

Does it have any effect that sectionName is a varchar so both indexes should give similar performance?

我还假设 Using index condition 只从索引中选择几行并扫描更多行.而如果 使用 where;使用索引,优化器只需要读取索引而不是表来获取行,然后选择一些数据.那么为什么 Using where 在使用 rank_index 时会丢失?

Also I am assuming Using index condition is selecting only few rows from index and scanning some more. While in case Using where; Using index, optimizer has to only read the index and not table to get rows and then it is selecting some data. Then why Using where is missing while using rank_index?

而且为什么mapping_package的key_len是4,索引只有两列?

Moreover why is the key_len for mapping_package is 4 when there are only two columns in the index?

感谢帮助.

推荐答案

(19434<20160) -- 这两个数字都是估计值.他们离得这么近是不寻常的.我敢打赌,如果你做了ANALYZE TABLE,两者都会改变,可能会改变不等式.

(19434<20160) -- Both of those numbers are estimates. It is unusual for them to be that close. I'll bet if you did ANALYZE TABLE, both would change, possibly changing the inequality.

注意别的:Using where;使用索引 vs 使用索引条件.

但首先,让我提醒您,在 InnoDB 中,PRIMARY KEY 列附加到辅助键上.所以,实际上你有

But first, let me remind you that, in InnoDB, the PRIMARY KEY columns are tacked onto the secondary key. So, effectively you have

KEY `rank_index`      (`mappingId`,`packageId`,`sectionName`,`marksObtained`,`email`)
KEY `mapping_package` (`mappingId`,`packageId`,`email`,`sectionName`)

现在让我们决定最佳索引应该是什么:其中 mappingId = ?1和 packageId = ?2和部分名称 = ?3按标记排序获得的描述

Now let's decide what the optimal index should be: where mappingId = ?1 and packageId = ?2 and sectionName = ?3 order by marksObtained desc

  • 首先,WHERE=部分:mappingIdpackageIdsectionName,以任何顺序;
  • 然后是 ORDER BY 列:marksObtained
  • 奖励:最后,如果 email(SELECT 中唯一提到的其他列)在键中,它将是Covering".
  • First, the = parts of WHERE: mappingId, packageId, sectionName, in any order;
  • Then the ORDER BY column(s): marksObtained
  • Bonus: Finally if email (the only other column mentioned anywhere in the SELECT) is in the key, it will be "Covering".

这说明 rank_index 是完美的",而其他索引则不太好.唉,EXPLAIN 并没有明确说明这一点.

This says that rank_index is "perfect", and the other index is not so good. Alas, EXPLAIN does not clearly say that.

你也可以想到这一点——你所需要的只是研究我的博客:http://mysql.rjweb.org/doc.php/index_cookbook_mysql(抱歉,时间不早了,我有点厚脸皮了.)

You, too, could have figured this out -- all you needed is to study my blog: http://mysql.rjweb.org/doc.php/index_cookbook_mysql (Sorry; it's getting late, and I am getting cheeky.)

其他提示:

  • 不要盲目使用(255).当需要 tmp 表时,这会使 tmp 表更大,因此效率较低.将限制降低到合理的程度.或者...
  • 如果这是一个巨大的表,你真的应该规范化"字符串,用 2 字节的 SMALLINT UNSIGNED 替换它们.这将通过其他方式提高性能,例如减少昂贵的 I/O.(好吧,20 行非常小,所以这可能不适用.)
  • Don't blindly use (255). When a tmp table is needed, this can make the the tmp table bigger, hence less efficient. Lower the limit to something reasonable. Or...
  • If this is a huge table, you really ought to 'normalize' the strings, replacing them with maybe a 2-byte SMALLINT UNSIGNED. This will improve performance in other ways, such as decreasing costly I/O. (OK, 20 rows is pretty small, so this may not apply.)

为什么 key_len 是 4?这意味着使用了一列,即 4 字节的 INT mappingId.我也希望它使用第二列.所以,我很难过.EXPLAIN FORMAT=JSON SELECT ... 可能会提供更多线索.

Why is key_len 4? That implies that one column was used, namely the 4-byte INT mappingId. I would have expected it to use the second column, too. So, I am stumped. EXPLAIN FORMAT=JSON SELECT ... may provide more clues.

这篇关于MySQL没有从索引中选择正确的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆