嵌套查询的执行次数 [英] Number of times a nested query is executed

查看:200
本文介绍了嵌套查询的执行次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此嵌套子查询将执行多少次?

SELECT CID, CNAME 
  FROM CUSTOMER
 WHERE EXISTS ( SELECT CID 
                  FROM RENTALS 
                 WHERE CUSTOMER.CID = RENTALS.CID 
                   AND PICKUP = 'CARY' )

这是一个理论问题,即可以在我的书中找到.提供的答案是6,但我不明白为什么会这样.


好的,我认为这本书本身存在一些问题.我会仔细阅读这本书,也许以后再问这个问题.

解决方案

正如其他人指出的那样,该相关子查询可以重写为联接,但这并不是完整的故事,因为未转换的EXISTS的执行计划正在进行中看起来还是很像联接.因此,这实际上不是语法问题,而是查询优化问题.

EXISTS实际上只是加入此数据集但即使有1,000,000个匹配项也只能加入其中的一行"的句法简写,也称为 解决方案

As others point out, this correlated subquery can be rewritten as a join, but that is not entirely the complete story because the execution plan for an untransformed EXISTS is going to look pretty much like a join anyway. So this is not really a syntax question but a query optimisation issue.

EXISTS is really just syntactic shorthand for "join to this data set but only to a single row in it even if there are 1,000,000 matches", or what is also known as a semijoin.

So the semijoin required by an EXISTS predicate against a correlated or uncorrelated subquery can be implemented in a number of ways, which depend to a large extent on the numbers or records in the two tables.

If you imagine that CUSTOMER is estimated to have a single row, and the optimiser estimates that there are many thousands of rows in RENTALS for which PICKUP = 'CARY', then the optimiser's is pretty likely to read the row from the CUSTOMER TABLE and perform a single lookup against the RENTALS table.

If there are an estimated one million CUSTOMERS and only one row in the RENTALS table then that execution plan would be crazy -- the optimiser could instead invert the join by leading with the RENTALS table and looking up against the CUSTOMER table the single row to be returned. In that case the subquery has arguably not been executed at all.

In between these extremes there are various other optimisation. For example, building the unique values of the RENTAL.CID column into an in-memory hash table for rows where PICKUP='CARY' and full-scanning the CUSTOMER TABLE to probe that hash table for every row, which would be a HASH SEMIJOIN. Again, no execution of a recognisable subquery. (And the query rewrite that Barmer suggests is likely to lead to that kind of plan, but might also constrain the optimiser from seeing other plans suitable for other data distributions and cardinalities).

So, as other answers say, the question is really moot, because I think that there are two important lessons here:

  1. Many different SQL statements can lead to the same execution plan, and a single SQL statement can also lead to multiple execution plans.
  2. You should write queries that express syntactically the result that you want, and not in general to preempt and constrain the query optimiser's choices.

The second point is important, as it argues against some developer's instinct to avoid writing correlated subqueries (and EXISTS or NOT EXISTS in particular it seems) in the hope of providing their own optimisation. In particular, replacing an EXISTS with an outer join can be very suboptimal given the right/wrong data distributions.

To get to the direct answer to the question, I'd say either:

  • 0
  • 1
  • However many rows are in CUSTOMERS
  • Possibly something else.

这篇关于嵌套查询的执行次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆