查询优化器运算符选择 - 嵌套循环与哈希匹配(或合并) [英] query optimizer operator choice - nested loops vs hash match (or merge)

查看:20
本文介绍了查询优化器运算符选择 - 嵌套循环与哈希匹配(或合并)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的一个存储过程执行时间过长.查看查询执行计划,我能够找到耗时过长的操作.它是一个嵌套循环物理运算符,具有外表(65991 行)和内表(19223 行).在嵌套循环中,它显示估计行数 = 1,268,544,993(65991 乘以 19223)如下:

One of my stored procedures was taking too long execute. Taking a look at query execution plan I was able to locate the operation taking too long. It was a nested loop physical operator that had outer table (65991 rows) and inner table (19223 rows). On the nested loop it showed estimated rows = 1,268,544,993 (multiplying 65991 by 19223) as below:

我阅读了几篇关于用于连接的物理运算符的文章,但对嵌套循环或哈希匹配是否更适合这种情况感到有些困惑.从我可以收集到的:

I read a few articles on physical operators used for joins and got a bit confused whether nested loop or hash match would have been better for this case. From what i could gather:

Hash Match - 当没有可用的有用索引时,优化器使用,一个表明显小于另一个,表没有在连接列上排序.哈希匹配也可能表明可以使用更有效的连接方法(嵌套循环或合并连接).

Hash Match - is used by optimizer when no useful indexes are available, one table is substantially smaller than the other, tables are not sorted on the join columns. Also hash match might indicate more efficient join method (nested loops or merge join) could be used.

问题:在这种情况下,哈希匹配会比嵌套循环更好吗?

Question: Would hash match be better than nested loops in this scenario?

谢谢

推荐答案

绝对.哈希匹配将是一个巨大的改进.与需要 1,268,544,993 行比较的嵌套循环相比,在较小的 19,223 行表上创建哈希,然后使用较大的 65,991 行表对其进行探测是一个小得多的操作.

ABSOLUTELY. A hash match would be a huge improvement. Creating the hash on the smaller 19,223 row table then probing into it with the larger 65,991 row table is a much smaller operation than the nested loop requiring 1,268,544,993 row comparisons.

服务器选择嵌套循环的唯一原因是它严重低估了所涉及的行数.您的表格是否有统计数据,如果有,是否定期更新?统计信息使服务器能够选择好的执行计划.

The only reason the server would choose the nested loops is that it badly underestimated the number of rows involved. Do your tables have statistics on them, and if so, are they being updated regularly? Statistics are what enable the server to choose good execution plans.

如果您已正确处理统计信息但仍有问题,您可以强制它使用 HASH 连接,如下所示:

If you've properly addressed statistics and are still having a problem you could force it to use a HASH join like so:

SELECT *
FROM
   TableA A -- The smaller table
   LEFT HASH JOIN TableB B -- the larger table

请注意,当您执行此操作时,它也会强制执行连接顺序.这意味着您必须正确排列所有表,以便它们的连接顺序有意义.通常,您会检查服务器已有的执行计划并更改查询中表的顺序以匹配.如果您不熟悉如何执行此操作,基本原理是每个左"输入在前,并且在图形执行计划中,左输入是输入.涉及多个表的复杂连接可能必须在括号内将连接组合在一起,或者使用 RIGHT JOIN 以获得最佳执行计划(交换左右输入,但在正确的位置引入表连接顺序中的点).

Please note that the moment you do this it will also force the join order. This means you have to arrange all your tables correctly so that their join order makes sense. Generally you would examine the execution plan the server already has and alter the order of your tables in the query to match. If you're not familiar with how to do this, the basics are that each "left" input comes first, and in graphical execution plans, the left input is the lower one. A complex join involving many tables may have to group joins together inside parentheses, or use RIGHT JOIN in order to get the execution plan to be optimal (swap left and right inputs, but introduce the table at the correct point in the join order).

通常最好避免使用连接提示和强制连接顺序,所以先做任何其他事情!您可以查看表上的索引、碎片、减少列大小(例如使用 varchar 而不是 nvarchar 在不需要 Unicode 的情况下),或将查询拆分为多个部分(先插入临时表,然后加入).

It is generally best to avoid using join hints and forcing join order, so do whatever else you can first! You could look into the indexes on the tables, fragmentation, reducing column sizes (such as using varchar instead of nvarchar where Unicode is not required), or splitting the query into parts (insert to a temp table first, then join to that).

这篇关于查询优化器运算符选择 - 嵌套循环与哈希匹配(或合并)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆