MySQL是否会自动优化子查询? [英] Does MySQL optimize subqueries automatically?

查看:259
本文介绍了MySQL是否会自动优化子查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想运行以下查询:

-- Main Query    
SELECT COUNT(*) FROM table_name WHERE device_id IN 
     (SELECT DISTINCT device_id FROM table_name WHERE NAME = 'SOME_PARA')

以下查询(来自主查询的子查询):

This following query (sub query from Main Query):

SELECT DISTINCT device_id FROM table_name WHERE NAME = 'SOME_PARA'

在7秒内执行,从210万行的表中产生2691行.

executes in 7 seconds, giving 2691 rows from a table of 2.1M rows.

我触发了上面的 Main Query (主查询),并且在等待5分钟以上后它仍在执行.

I fired the Main Query above and it is still executing after 5 mins+ of waiting.

最后,我分别执行子查询,从结果中提取2691条记录,执行以下查询:

Finally, I executed the sub query separately, took the 2691 records from the result, executed the following query:

-- Main Query (improvised)    
SELECT COUNT(*) FROM table_name WHERE device_id IN 
     ("device_id_1", "device_id_2", ....., "device_id_2691")

令人惊讶的是,这在40秒内给了我一个答案.

Surprisingly, this gave me an answer within 40 seconds.

有什么作用?为什么MySQL不使用我使用的相同技术并迅速给出答案?我在做错什么吗?

What gives? Why doesn't MySQL use the same technique that I used and give an answer quickly? Am I doing something wrong?

推荐答案

不幸的是,MySQL在用IN优化子查询方面不是很擅长.这来自 MySQL文档:

Unfortunately, MySQL is not very good at optimizing subqueries with IN. This is from MySQL documentation:

IN的子查询优化不如=运算符有效 或IN(value_list)运算符.

Subquery optimization for IN is not as effective as for the = operator or for the IN(value_list) operator.

IN子查询性能低下的典型情况是当子查询 返回少量行,但外部查询返回大行 要与子查询结果进行比较的行数.

A typical case for poor IN subquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.

问题在于,对于使用IN子查询的语句, 优化器将其重写为相关子查询.考虑以下 使用不相关子查询的语句:

The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. Consider the following statement that uses an uncorrelated subquery:

SELECT ... FROM t1在t1.a IN(SELECT b FROM t2);

SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);

优化器将语句重写为相关的子查询:

The optimizer rewrites the statement to a correlated subquery:

从...存在的t1中选择...(从t2的t2.b = t1.a中选择1);

SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);

如果内部查询和外部查询分别返回M行和N行,则 执行时间变成O(M×N)的量级,而不是O(M + N)的量级 这将是一个不相关的子查询.

If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(M×N), rather than O(M+N) as it would be for an uncorrelated subquery.

暗示是IN子查询比查询慢得多 使用列出相同值的IN(value_list)运算符编写 子查询将返回.

An implication is that an IN subquery can be much slower than a query written using an IN(value_list) operator that lists the same values that the subquery would return.

尝试使用JOIN.

由于MySQL由内而外地起作用,因此有时您可以通过将子查询包装在另一个子查询中来欺骗MySQL,如下所示:

Because MySQL works from the inside out, sometimes you can trick MySQL by wrapping the subquery inside yet another subquery like so:

SELECT COUNT(*) FROM table_name WHERE device_id IN
     (SELECT * FROM (SELECT DISTINCT device_id FROM table_name WHERE NAME = 'SOME_PARA') tmp)

这是JOIN解决方案:

Here's the JOIN solution:

SELECT COUNT(DISTINCT t2.id) FROM table_name t1
  JOIN table_name t2
    ON t2.device_id = t1.device_id
  WHERE t1.NAME = 'SOME_PARA'

请注意,我是从内而外开始的.

Notice that I start from the inside and go out also.

这篇关于MySQL是否会自动优化子查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆