为什么"HASH JOIN"或"LOOP JOIN"会改善此存储过程? [英] Why does 'HASH JOIN' or 'LOOP JOIN' improve this stored proc?
问题描述
我有一个基本查询,只需将一个联接从LEFT JOIN
更改为LEFT HASH JOIN
或'LEFT LOOP JOIN',就可以将查询时间从6秒缩短到1秒.谁能解释为什么这会导致性能如此大的提高,以及为什么SQL的优化器无法自行解决?
I have a basic query that goes from 6 seconds to 1 second just by changing one join from LEFT JOIN
to LEFT HASH JOIN
or 'LEFT LOOP JOIN'. Can anyone explain why this would cause such a large increase in performance and why SQL's optimizer isn't figuring it out on it's own?
这里大致是SQL的样子:
Here is roughly what the SQL looks like:
SELECT
a.[ID]
FROM
[TableA] a
LEFT HASH JOIN
[TableB] b
ON b.[ID] = a.[TableB_ID]
JOIN
[TableC] c
ON c.[ID] = a.[TableC_ID]
WHERE
a.[SomeDate] IS NULL AND
a.[SomeStatus] IN ('X', 'Y', 'Z') AND
c.[SomethingElse] = 'ABC'
表A和B在所有ID字段上都有数百万条记录和索引.使用SQL Server 2005.
Table A and B have millions of records and indexes on all the ID fields. Using SQL Server 2005.
一位同事提出了LEFT LOOP JOIN的建议,它似乎使其变得更快... SQL并不是我的强项之一,所以我试图了解这些提示"的含义帮助.
A collegue suggested a LEFT LOOP JOIN and it seems to have made it even faster... SQL is not one of my strengths so I am trying to understand how these 'hints' are helping.
推荐答案
HASH JOIN
在很大一部分行有助于结果集时很有用.
HASH JOIN
is useful when the large percent of rows contributes to the resultset.
在您的情况下,在A
或B
上构建HASH TABLE
并扫描另一个表要比对B.ID
的索引执行NESTED LOOPS
或合并优化器之前使用的排序结果集要便宜.提示.
In your case, building a HASH TABLE
on either A
or B
and scanning another table is cheaper than either performing NESTED LOOPS
over the index on B.ID
or merging the sorted resultsets which the optimizer used before the hint.
SQL Server
的优化器没有看到:可能是因为您没有收集统计信息,可能是因为您的数据分布不正确.
SQL Server
's optimizer did not see that: probably because you didn't gather statistics, probably because your data distribution is skewed.
更新:
由于您提到LOOP JOIN
提高了速度,因此优化器可能错误地选择了JOIN
顺序.
Since you mentioned that LOOP JOIN
improved the speed, it may be so that the JOIN
order was chosen incorrectly by the optimizer.
这篇关于为什么"HASH JOIN"或"LOOP JOIN"会改善此存储过程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!