为什么MySQL并不总是将索引用于选择查询? [英] Why does MySQL not always use index for select query?
问题描述
我的数据库用户和文章中有两个表。
I have two tables in my database users and articles.
用户和文章表中的记录如下:
Records in my users and articles table are given below:
+----+--------+
| id | name |
+----+--------+
| 1 | user1 |
| 2 | user2 |
| 3 | user3 |
+----+--------+
+----+---------+----------+
| id | user_id | article |
+----+---------+----------+
| 1 | 1 | article1 |
| 2 | 1 | article2 |
| 3 | 1 | article3 |
| 4 | 2 | article4 |
| 5 | 2 | article5 |
| 6 | 3 | article6 |
+----+---------+----------+
下面的查询和受人尊敬的 EXPLAIN
输出。
Given below the queries and the respected EXPLAIN
output.
EXPLAIN SELECT * FROM articles WHERE user_id = 1;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | articles | NULL | ALL | user_id | NULL | NULL | NULL | 6 | 50.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
EXPLAIN SELECT * FROM articles WHERE user_id = 2;
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | articles | NULL | ref | user_id | user_id | 5 | const | 2 | 100.00 | NULL |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
EXPLAIN SELECT * FROM articles WHERE user_id = 3;
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | articles | NULL | ref | user_id | user_id | 5 | const | 1 | 100.00 | NULL |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
看我选择查询的 EXPLAIN
计划,看来查询并不总是使用索引。
Looking at the EXPLAIN
plans for my select queries, it seems that queries are not always using the indexes.
以防万一,
当 user_id
为1时,它不使用键并扫描整个表。
when user_id
is 1, it doesn't use the key and scans the complete table.
否则,它使用 user_id
键,仅扫描几行。
otherwise, it uses the user_id
key and scans only few rows.
您能解释一下为什么查询并不总是在这里使用索引吗?
Could you please explain why queries don't always use the index here?
推荐答案
您显示的查询中可能包含两个BTree。一个用于数据的BTree,按 PRIMARY KEY
排序,我假设它是 id
。另一个用于 user_id
上的 INDEX
(我想也是)。当InnoDB(我假设您正在使用)构建二级索引(例如 INDEX(user_id)
)时,它会默默地附加在表的PK上。因此,实际上,它成为包含两个列的 BTree
:(user_id,id)
并按该对进行排序。
There are (probably) two BTrees involved in the queries you show. One BTree for the data, sorted by the PRIMARY KEY
, which I assume is id
. The other for the INDEX
on user_id
(again, I am guessing). When InnoDB (which I assume you are using) builds a "secondary index", such as INDEX(user_id)
, it silently tacks on the PK of the table. So, effectively it becomes a BTree
containing two columns: (user_id, id)
and sorted by that pair.
优化器查看 SELECT * FROM t WHERE user_id =?
时,它对表进行了调查,发现很多行中的 user_id = 1
并且没有多少行具有您尝试过的其他值。
When the Optimizer looks at SELECT * FROM t WHERE user_id=?
, it probed the table and discovered that "a lot" of rows had user_id = 1
and not many rows had the other values you tried.
优化器有两个(或更多种)评估查询的方法-
The Optimizer has two (or more) ways to evaluate the queries like that --
计划A(使用索引):这是它的作用:
Plan A (use the index): Here's what it does:
- 钻取索引的BTree来查找
user_id = 2
的第一次出现。 - 会在其中找到
id
。 - 使用该
id
向下钻取数据的BTree来找到*
(如SELECT *
中所示)。 - 移至索引BTree中的下一个条目。 (这实际上是相当有效的,因为它实际上是 B +树;请参阅Wikipedia。)
- 如果找到,则循环回到步骤2。
user_id = 2
),退出。
- Drill down the Index's BTree to find the first occurrence of
user_id=2
. - There it will find an
id
. - Use that
id
to drill down the data's BTree to find*
(as inSELECT *
). - Move on to the next entry in the Index BTree. (This is actually rather efficient since it is really a "B+Tree"; see Wikipedia.)
- If found, loop back to step 2. If not found (no more index entries with
user_id=2
), exit.
方案B(请勿使用索引-对您的 user_id = 1
很有用):
Plan B (don't use the index -- useful for your user_id=1
):
- 只需浏览一下
- 跳过没有
user_id = 1
的任何行。
- Simply walk through the data BTree in whatever order.
- Skip any row that does not have
user_id=1
.
两个BTree之间来回弹跳会花费一些时间。优化程序确定您的 = 1
案例需要查看表的20%以上,并确定计划B会更快。也就是说,它故意忽略了INDEX。
The bouncing back and forth between the two BTrees costs something. The Optimizer decided your =1
case would need to look at more than about 20% of the table and decided that plan B would be faster. That is, it deliberately ignored the INDEX.
Optimizer无法或不能正确估计很多因素,但是通常 在这两个计划之间进行选择可以加快执行速度。 (您的表太小,无法可靠地衡量差异。)
There are a lot of factors that the Optimizer can't or doesn't estimate correctly, but generally picking between these two Plans leads to faster execution. (Your table is too small to reliably measure a difference.)
其他计划-如果索引为 covering,则无需使用数据B树。如果存在可以使用的 ORDER BY
,则优化程序将可能使用计划A来避免文件排序。 (请参见 EXPLAIN SELECT ...
)等等。
Other "Plans" -- If the index is "covering", there is no need to use the data BTree. If there is an ORDER BY
that can be used, then the Optimizer will probably use Plan A to avoid the "filesort". (See EXPLAIN SELECT ...
) Etc.
这篇关于为什么MySQL并不总是将索引用于选择查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!