MySQL是否会自动优化子查询? [英] Does MySQL optimize subqueries automatically?
问题描述
我想运行以下查询:
-- Main Query
SELECT COUNT(*) FROM table_name WHERE device_id IN
(SELECT DISTINCT device_id FROM table_name WHERE NAME = 'SOME_PARA')
以下查询(来自主查询的子查询):
This following query (sub query from Main Query):
SELECT DISTINCT device_id FROM table_name WHERE NAME = 'SOME_PARA'
在7秒内执行,从210万行的表中产生2691行.
executes in 7 seconds, giving 2691 rows from a table of 2.1M rows.
我触发了上面的 Main Query (主查询),并且在等待5分钟以上后它仍在执行.
I fired the Main Query above and it is still executing after 5 mins+ of waiting.
最后,我分别执行子查询,从结果中提取2691条记录,执行以下查询:
Finally, I executed the sub query separately, took the 2691 records from the result, executed the following query:
-- Main Query (improvised)
SELECT COUNT(*) FROM table_name WHERE device_id IN
("device_id_1", "device_id_2", ....., "device_id_2691")
令人惊讶的是,这在40秒内给了我一个答案.
Surprisingly, this gave me an answer within 40 seconds.
有什么作用?为什么MySQL不使用我使用的相同技术并迅速给出答案?我在做错什么吗?
What gives? Why doesn't MySQL use the same technique that I used and give an answer quickly? Am I doing something wrong?
推荐答案
不幸的是,MySQL在用IN优化子查询方面不是很擅长.这来自 MySQL文档:
Unfortunately, MySQL is not very good at optimizing subqueries with IN. This is from MySQL documentation:
IN的子查询优化不如=运算符有效 或IN(value_list)运算符.
Subquery optimization for IN is not as effective as for the = operator or for the IN(value_list) operator.
IN子查询性能低下的典型情况是当子查询 返回少量行,但外部查询返回大行 要与子查询结果进行比较的行数.
A typical case for poor IN subquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.
问题在于,对于使用IN子查询的语句, 优化器将其重写为相关子查询.考虑以下 使用不相关子查询的语句:
The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. Consider the following statement that uses an uncorrelated subquery:
SELECT ... FROM t1在t1.a IN(SELECT b FROM t2);
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
优化器将语句重写为相关的子查询:
The optimizer rewrites the statement to a correlated subquery:
从...存在的t1中选择...(从t2的t2.b = t1.a中选择1);
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
如果内部查询和外部查询分别返回M行和N行,则 执行时间变成O(M×N)的量级,而不是O(M + N)的量级 这将是一个不相关的子查询.
If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(M×N), rather than O(M+N) as it would be for an uncorrelated subquery.
暗示是IN子查询比查询慢得多 使用列出相同值的IN(value_list)运算符编写 子查询将返回.
An implication is that an IN subquery can be much slower than a query written using an IN(value_list) operator that lists the same values that the subquery would return.
尝试使用JOIN.
由于MySQL由内而外地起作用,因此有时您可以通过将子查询包装在另一个子查询中来欺骗MySQL,如下所示:
Because MySQL works from the inside out, sometimes you can trick MySQL by wrapping the subquery inside yet another subquery like so:
SELECT COUNT(*) FROM table_name WHERE device_id IN
(SELECT * FROM (SELECT DISTINCT device_id FROM table_name WHERE NAME = 'SOME_PARA') tmp)
这是JOIN解决方案:
Here's the JOIN solution:
SELECT COUNT(DISTINCT t2.id) FROM table_name t1
JOIN table_name t2
ON t2.device_id = t1.device_id
WHERE t1.NAME = 'SOME_PARA'
请注意,我是从内而外开始的.
Notice that I start from the inside and go out also.
这篇关于MySQL是否会自动优化子查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!