为什么这个(不相关的)子查询会导致这样的问题? [英] Why is this (non-correlated) subquery causing such problems?

查看:27
本文介绍了为什么这个(不相关的)子查询会导致这样的问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大型查询,其中一个简单的子查询优化将其从 8 分钟缩短到 20 秒.我不确定我是否理解为什么优化会产生如此剧烈的效果.

I've got a large query where a simple subquery optimization dropped it from 8 minutes down to 20 seconds. I'm not sure I understand why the optimization had such a drastic effect.

本质上,这里是问题部分:

In essence, here's the problem part:

SELECT  (bunch of stuff)
FROM
  a LEFT OUTER JOIN b ON a.ID = b.a
  LEFT OUTER JOIN c ON b.ID = c.b
  ...
  ...
      INNER JOIN veryLargeTable 
      ON a.ID = veryLargeTable.a 
         AND veryLargeTable.PetID = 
             (SELECT id from Pets WHERE Pets.Name = 'Something')    /* BAD! */
  ...
  ...

总共有 16 个连接表.如果我将 veryLargeTable 连接的第二个谓词替换为包含 petID 的预填充变量(而不是使用子查询),则整个查询会显着地加速:

In all, there are 16 joined tables. If I replace the second predicate of the veryLargeTable join with a pre-populated variable containing the petID (instead of using the subquery) the entire query speeds up dramatically:

AND veryLargeTable.PetID = @petID   /* Awesome! */


显然, (SELECT id from Pets WHERE Name = 'Something') 正在为每一行执行.有两件事我不完全理解:


Obviously, (SELECT id from Pets WHERE Name = 'Something') is being executed for every row. There are two things I don't fully understand:

  1. 据我所知,这是一个不相关的子查询.Pets 表根本不是外部查询的一部分.不相关的子查询不是独立评估(并因此优化)吗?为什么这里不是这种情况?

  1. As far as I can tell, this is a non-correlated subquery. The Pets table is not part of the outer query at all. Aren't non-correlated subqueries independently evaluated (and hence optimized)? Why isn't this the case here?

执行计划截然不同.在失败的情况下(上图),整个子树处理大约 950k 行.在获胜的情况下(使用变量而不是子查询),估计只有大约 125k 行.这是怎么回事?如果存在该子查询,为什么会涉及更多行?Pets.Name 列肯定具有唯一数据(但据我所知没有唯一约束).

The execution plans are dramatically different. In the failure case (above), the entire subtree deals with an estimated 950k rows. In the win case (using a variable instead of a subquery), there's only about 125k estimated rows. What's going on? Why are so many more rows involved if that subquery is there? The Pets.Name column definitely has unique data (but no unique constraint as far as I can tell).

请注意,将谓词移动到 WHERE 子句不会影响查询,正如我所料,因为它是一个 INNER JOIN.

Note that moving the predicate to the WHERE clause doesn't affect the query in either case, as I would expect, since it's an INNER JOIN.

感谢见解!

推荐答案

根据我的经验,查询越复杂,SQL 优化器创建灵巧计划的能力就越低.这里你有 16 个连接,一些或大部分是外连接,你至少有一个子查询……加入足够多的索引、基数、视图、外应用,谁知道还有什么,没有人,甚至微软也不知道工程师*,可以找出将统一和定期生成最佳计划的例程.

It has been my experience that, the more complex your queries get, the less able the SQL optimizer is to create deft plans. Here you've got 16 joins, some or most are outer joins, you've got at least one subquery... toss in enough indexes, cardinalities, views, outer applies, and who knows what else and no one, not even Microsoft engineers*, can figure out routines that will uniformly and regularly generate The most optimal plans.

你所描述的,我已经经历过无数次——在一个凌乱的查询中改变一件简单的事情,一切都会快一个数量级(或者,咬牙切齿,更慢).我没有办法确定复杂何时太复杂,它更像是一种感觉.我的一般经验法则是,如果它看起来太长或太复杂,请尽可能简化 - 例如预先选择的单个嵌套值,或者打破查询的一部分而不是总是使用小结果集快速运行,然后首先运行它并将结果存储在临时表中.

What you've described, I've experienced numerous times -- change one simple thing in a messy query and everything's an order of magnitude faster (or, gnashes teeth, slower). I have no method for determining when complex is too complex, it's more a feeling than anything else. My general rule of thumb is, if it looks too long or too complex, simplify where you can--such as your pre-selected single nested value, or breaking out part of the query than will always run fast with a small result set, and running it first and storing the results in a temp table.

( * 请注意这是轻微的讽刺)

( * Please note that this is mild sarcsam)

这篇关于为什么这个(不相关的)子查询会导致这样的问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆