为什么IN条件比“=”更慢。在sql? [英] Why would an IN condition be slower than "=" in sql?

查看:291
本文介绍了为什么IN条件比“=”更慢。在sql?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

检查问题此SELECT查询需要180秒完成

Check the question This SELECT query takes 180 seconds to finish (check the comments on the question itself).
The IN get to be compared against only one value, but still the time difference is enormous.
Why is it like that?

推荐答案

摘要:这是一个已知问题在MySQL和固定在MySQL 5.6.x.问题是由于在使用IN的子查询未正确识别为依赖子查询而不是独立子查询时缺少优化。

Summary: This is a known problem in MySQL and was fixed in MySQL 5.6.x. The problem is due to a missing optimization when a subquery using IN is incorrectly indentified as dependent subquery instead of an independent subquery.

当你对原始查询运行EXPLAIN时,它返回:

When you run EXPLAIN on the original query it returns this:


1  'PRIMARY'             'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where'
2  'DEPENDENT SUBQUERY'  'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where'
3  'DEPENDENT SUBQUERY'  'question_law'          'ALL'  ''  ''  ''  ''  10040  'Using where'

当您更改 IN = ,即可得到:

When you change IN to = you get this:


1  'PRIMARY'   'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where'
2  'SUBQUERY'  'question_law_version'  'ALL'  ''  ''  ''  ''  10148  'Using where'
3  'SUBQUERY'  'question_law'          'ALL'  ''  ''  ''  ''  10040  'Using where'

每个从属子查询每次运行一次它包含在查询中的行,而子查询只运行一次。当有一个条件可以转换为连接时,MySQL有时可以优化依赖子查询,但这里不是这样。

Each dependent subquery is run once per row in the query it is contained in, whereas the subquery is run only once. MySQL can sometimes optimize dependent subqueries when there is a condition that can be converted to a join but here that is not the case.

现在这个问题为什么MySQL认为IN版本需要是一个从属的子查询。我已经做了查询的简化版本,以帮助调查这一点。我创建了两个表'foo'和'bar',其中前者只包含id列,后者包含id和foo id(虽然我没有创建外键约束)。然后我填充两个表1000行:

Now this of course leaves the question of why MySQL believes that the IN version needs to be a dependent subquery. I have made a simplified version of the query to help investigate this. I created two tables 'foo' and 'bar' where the former contains only an id column, and the latter contains both an id and a foo id (though I didn't create a foreign key constraint). Then I populated both tables with 1000 rows:

CREATE TABLE foo (id INT PRIMARY KEY NOT NULL);
CREATE TABLE bar (id INT PRIMARY KEY, foo_id INT NOT NULL);

-- populate tables with 1000 rows in each

SELECT id
FROM foo
WHERE id IN
(
    SELECT MAX(foo_id)
    FROM bar
);

这个简化的查询与以前的问题相同 - 内部选择被视为依赖子查询,没有执行优化,使得内部查询每行运行一次。查询几乎需要运行一秒钟。将 IN 更改为 = 可让查询几乎立即运行。

This simplified query has the same problem as before - the inner select is treated as a dependent subquery and no optimization is performed, causing the inner query to be run once per row. The query takes almost one second to run. Changing the IN to = again allows the query to run almost instantly.

我用来填充表格的代码如下,以防任何人想重现结果。

The code I used to populate the tables is below, in case anyone wishes to reproduce the results.

CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt <= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;

CALL prc_filler(1000);

INSERT foo SELECT id FROM filler;
INSERT bar SELECT id, id FROM filler;

这篇关于为什么IN条件比“=”更慢。在sql?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆