MySQL:为什么IN子句中的第5个ID会大大改变查询计划? [英] MySQL: Why 5th ID in the IN clause drastically changes query plan?

查看:90
本文介绍了MySQL:为什么IN子句中的第5个ID会大大改变查询计划?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出以下两个查询:

查询#1

SELECT log.id
FROM log
WHERE user_id IN
      (188858, 188886, 189854, 203623, 204072)
      and type in (14, 15, 17)
ORDER BY log.id DESC
LIMIT 25 OFFSET 0;

查询#2-4个ID代替5

SELECT log.id
FROM log
WHERE user_id IN
      (188858, 188886, 189854, 203623)
      and type in (14, 15, 17)
ORDER BY log.id DESC
LIMIT 25 OFFSET 0;

说明计划

-- Query #1
1   SIMPLE  log range   idx_user_id_and_log_id  idx_user_id_and_log_id  4       41280   Using index condition; Using where; Using filesort
-- Query #2
1   SIMPLE  log index   idx_user_id_and_log_id  PRIMARY                 4       53534   Using where

为什么添加单个ID会使执行计划如此不同?我说的是毫秒到1分钟之间的时间差.我以为它可能与eq_range_index_dive_limit参数有关,但是无论如何它都低于10(默认值).我知道我可以强制使用索引而不是clustered index,但是我想知道为什么MySQL决定这样做.

Why the addition of a single ID makes the execution plan so different? I'm talking about a difference in time of milliseconds to ~1 minute. I thought that it could be related to the eq_range_index_dive_limit parameters, but it's bellow 10 anyway (the default). I know that I can force the usage of the index instead of the clustered index, but I wanted to know why MySQL decided that.

我应该试着理解吗?还是有时无法理解查询计划者的决定?

Should I try to understand that? Or sometimes it's not possible to understand query planner decisions?

其他详细信息

  • 表大小:11GB
  • 行:1.08亿
  • MySQL:5.6.7
  • 从IN子句中删除哪个ID无关紧要.
  • 索引:idx_user_id_and_log_id(user_id, id)
  • Table Size: 11GB
  • Rows: 108 Million
  • MySQL: 5.6.7
  • Doesn't matter which ID is removed from the IN clause.
  • The index: idx_user_id_and_log_id(user_id, id)

推荐答案

如您所显示,MySQL对于ORDER BY ... LIMIT n的查询有两种替代的查询计划:

As you have shown, MySQL has two alternative query plans for queries with ORDER BY ... LIMIT n:

  1. 阅读所有符合条件的行,对它们进行排序,然后选择 n 顶行.
  2. 按排序顺序读取行,并在找到 n 个符合条件的行时停止.
  1. Read all qualifying rows, sort them, and pick the n top rows.
  2. Read the rows in sorted order and stop when n qualifying rows have been found.

为了确定哪个是更好的选择,优化器需要估计WHERE条件的过滤效果.这不是直截了当的,特别是对于没有索引的列或与值相关的列.在您的情况下,为了找到前25个符合条件的行,可能需要按排序顺序读取更多的表,而不是优化程序预期的行.

In order to decide which is the better option, the optimizer needs to estimate the filtering effect of your WHERE condition. This is not straight-forward, especially for columns that are not indexed, or for columns where values are correlated. In your case, one probably has to read a lot more of the table in sorted order in order to find the first 25 qualifying rows than what the optimizer expected.

在5.6的更高版本(您正在GA之前的版本中运行!)和较新的版本(5.7、8.0)中,LIMIT查询的处理方式都进行了一些改进.我建议您尝试升级到更高版本,看看是否仍然存在问题.

There have been several improvements in how LIMIT queries are handled, both in later releases of 5.6 (you are running on a pre-GA release!), and in newer releases (5.7, 8.0). I suggest you try to upgrade to a later release, and see if this still is an issue.

通常,如果您想了解查询计划者的决定,则应查看查询的优化程序跟踪.

In general, if you want to understand query planner decisions, you should look at the optimizer trace for the query.

这篇关于MySQL:为什么IN子句中的第5个ID会大大改变查询计划?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆