使用不使用INDEXes的查询变量进行SELECT [英] SELECT with query variables not using INDEXes

查看:159
本文介绍了使用不使用INDEXes的查询变量进行SELECT的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在玩(出于兴趣),在一个简单的邻接列表中检索一个节点树,并使用局部变量进行递归查询。

I was playing around (out of interest) with retrieving a tree of nodes in a simple adjacency list with a recursive query using local variables.

我到目前为止的解决方案很有趣但我不知道(这是我唯一的问题)为什么MySQL拒绝使用任何 INDEX 优化此查询。 MySQL不能使用 INDEX 查找最近的孩子吗?

The solution i have so far is fun but i wonder (and this is my only question) why MySQL refuses to use any INDEX to optimize this query. Shouldn't MySQL be able to lookup the nearest child(s) by using an INDEX?

我是好奇为什么MySQL没有。即使我使用 FORCE INDEX ,执行计划也不会改变。

I'm curious why MySQL doesn't. Even when i use FORCE INDEX the execution plan doesn't change.

这是迄今为止的查询, 5 是父节点的ID:

This is the query so far, with 5 being the ID of the parent node:

SELECT 
  @last_id := id AS id,
  parent_id,
  name,
  @depth := IF(parent_id = 5, 1, @depth + 1) AS depth
FROM 
  tree FORCE INDEX (index_parent_id, PRIMARY, index_both),
  (SELECT @last_id := 5, @depth := -1) vars
WHERE id = 5 OR parent_id = @last_id OR parent_id = 5




在SQLfiddle尝试实例

请注意原因不能是小数据集,因为当我指定 FORCE INDEX(id) FORCE INDEX(parent_id)时,行为不会改变 FORCE INDEX(id,parent_id) ...

Note that the reason can't be the small dataset, because the behaviour doesn't change when i specify FORCE INDEX (id) or FORCE INDEX (parent_id) or FORCE INDEX (id, parent_id) ...

文档说:


您也可以使用FORCE INDEX,它的作用类似于USE INDEX(index_list),但是假设表扫描非常昂贵。换句话说,只有在无法使用某个给定索引查找表中的行时才使用表扫描。

You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table.

必须有一些东西让查询无法使用INDEX,但我不明白它是什么。

There must be something that renders the query unable to use the INDEX, but i don't understand what it is.

免责声明:我知道在SQL中存储和检索分层数据有不同的方法。我知道嵌套集模型。我不是在寻找替代实现。我不是在寻找嵌套集。

Disclaimer: I know there are different ways to store and retrieve hierarchical data in SQL. I know about the nested sets model. I'm not looking for an alternative implementation. I'm not looking for nested sets.

我也知道查询本身就是坚果并产生错误的结果。

我只想了解(详细)为什么MySQL在这种情况下没有使用 INDEX

I just want to understand (in detail) why MySQL is not using an INDEX in this case.

推荐答案

原因在于 WHERE OR 条件>子句。

The reason lies within the use of OR conditions in the WHERE clause.

为了说明,请尝试再次运行查询,这次只有 id = 5 条件,并获取(EXPLAIN输出):

To illustrate, try running the query again, this time with only the id = 5 condition, and get (EXPLAIN output):

+----+-------------+------------+--------+--------------------+---------+---------+-------+------+----------------+
| id | select_type | table      | type   | possible_keys      | key     | key_len | ref   | rows | Extra          |
+----+-------------+------------+--------+--------------------+---------+---------+-------+------+----------------+
|  1 | PRIMARY     | <derived2> | system | NULL               | NULL    | NULL    | NULL  |    1 |                |
|  1 | PRIMARY     | tree       | const  | PRIMARY,index_both | PRIMARY | 4       | const |    1 |                |
|  2 | DERIVED     | NULL       | NULL   | NULL               | NULL    | NULL    | NULL  | NULL | No tables used |
+----+-------------+------------+--------+--------------------+---------+---------+-------+------+----------------+

再次,这次只有 parent_id = @last_id OR parent_id = 5 条件,并获得:

And again, this time with only the parent_id = @last_id OR parent_id = 5 condition, and get:

+----+-------------+------------+--------+-----------------+------+---------+------+------+----------------+
| id | select_type | table      | type   | possible_keys   | key  | key_len | ref  | rows | Extra          |
+----+-------------+------------+--------+-----------------+------+---------+------+------+----------------+
|  1 | PRIMARY     | <derived2> | system | NULL            | NULL | NULL    | NULL |    1 |                |
|  1 | PRIMARY     | tree       | ALL    | index_parent_id | NULL | NULL    | NULL |   10 | Using where    |
|  2 | DERIVED     | NULL       | NULL   | NULL            | NULL | NULL    | NULL | NULL | No tables used |
+----+-------------+------------+--------+-----------------+------+---------+------+------+----------------+

MySQL处理多个索引不太好在同一个查询中。 AND条件下情况略好一些;一个人更有可能看到 index_merge 优化而不是索引联盟优化。

MySQL is not too good with handling multiple indexes in the same query. Things are slightly better with AND conditions; one is more likely to see an index_merge optimization than an index union optimization.

随着版本的推进,情况正在改善,但我已经测试了你在版本 5.5 上的查询,是当前最新的生产版本,结果如​​您所述。

Things are improving as versions advance, but I've tested you query on version 5.5, which is at current latest production version, and the results are as you describe.

要解释为什么这很困难,请考虑:两个不同的索引将回答两个不同的条件查询。一个将回答 id = 5 ,另一个回答 parent_id = @last_id或parent_id = 5 (BTW没问题在后者内部的 OR ,因为这两个术语都是在同一个索引中处理的。)

To explain why this is difficult, consider: two different indexes will answer for two different conditions of the query. One will answer for id = 5, the other for parent_id = @last_id OR parent_id = 5 (BTW no issue with the OR inside the latter, since both terms are handled from within the same index).

没有一个索引可以回答这两个,因此忽略 FORCE INDEX 指令。请参阅 FORCE INDEX 表示MySQL必须在表扫描中使用 索引。这并不意味着它必须在表扫描中使用多个索引。

There is no single index that can answer for both, and therefore the FORCE INDEX instruction is ignored. See, FORCE INDEX says MySQL has to use an index over a table scan. It does not imply it has to use more than one index over a table scan.

因此MySQL遵循此处文档的规则。但为什么这么复杂呢?因为要使用这两个索引来回答,MySQL必须从两者中收集结果,将一个存储在一些临时缓冲区中,同时管理第二个。然后必须通过该缓冲区来过滤掉相同的行(某些行可能适合所有条件)。然后扫描该缓冲区以便返回结果。

So MySQL follows the rules of the documentation here. But why is this so complicated? Because to answer using both indexes, MySQL has to gather results from both, store one's aside in some temporary buffer while managing the second. Then is has to go over that buffer to filter out identical rows (it is possible that some row fits all conditions). And then to scan that buffer so as to return the results.

但等等,该缓冲区本身没有编入索引。过滤重复项不是一项明显的任务。所以MySQL更喜欢在原始表上工作并在那里进行扫描,并避免一切混乱。

But wait, that buffer is in itself not indexed. Filtering duplicates is not an obvious task. So MySQL prefers to work on the original table and do the scan there, and avoid all that mess.

当然这是可以解决的。 Oracle的工程师可能会改进这一点(最近他们一直在努力改进查询执行计划),但我不知道这是否在TODO任务上,或者它是否具有高优先级。

Of course this is solvable. The engineers at Oracle may yet improve this (recently they have been working hard at improving query execution plans), but I don't know if this is on the TODO task, or if it has a high priority.

这篇关于使用不使用INDEXes的查询变量进行SELECT的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆