嵌套查询的执行次数 [英] Number of times a nested query is executed
问题描述
此嵌套子查询将执行多少次?
SELECT CID, CNAME
FROM CUSTOMER
WHERE EXISTS ( SELECT CID
FROM RENTALS
WHERE CUSTOMER.CID = RENTALS.CID
AND PICKUP = 'CARY' )
这是一个理论问题,即可以在我的书中找到.提供的答案是6,但我不明白为什么会这样.
好的,我认为这本书本身存在一些问题.我会仔细阅读这本书,也许以后再问这个问题.
正如其他人指出的那样,该相关子查询可以重写为联接,但这并不是完整的故事,因为未转换的EXISTS的执行计划正在进行中看起来还是很像联接.因此,这实际上不是语法问题,而是查询优化问题.
EXISTS实际上只是加入此数据集但即使有1,000,000个匹配项也只能加入其中的一行"的句法简写,也称为 解决方案
As others point out, this correlated subquery can be rewritten as a join, but that is not entirely the complete story because the execution plan for an untransformed EXISTS is going to look pretty much like a join anyway. So this is not really a syntax question but a query optimisation issue.
EXISTS is really just syntactic shorthand for "join to this data set but only to a single row in it even if there are 1,000,000 matches", or what is also known as a semijoin.
So the semijoin required by an EXISTS predicate against a correlated or uncorrelated subquery can be implemented in a number of ways, which depend to a large extent on the numbers or records in the two tables.
If you imagine that CUSTOMER is estimated to have a single row, and the optimiser estimates that there are many thousands of rows in RENTALS for which PICKUP = 'CARY', then the optimiser's is pretty likely to read the row from the CUSTOMER TABLE and perform a single lookup against the RENTALS table.
If there are an estimated one million CUSTOMERS and only one row in the RENTALS table then that execution plan would be crazy -- the optimiser could instead invert the join by leading with the RENTALS table and looking up against the CUSTOMER table the single row to be returned. In that case the subquery has arguably not been executed at all.
In between these extremes there are various other optimisation. For example, building the unique values of the RENTAL.CID column into an in-memory hash table for rows where PICKUP='CARY' and full-scanning the CUSTOMER TABLE to probe that hash table for every row, which would be a HASH SEMIJOIN. Again, no execution of a recognisable subquery. (And the query rewrite that Barmer suggests is likely to lead to that kind of plan, but might also constrain the optimiser from seeing other plans suitable for other data distributions and cardinalities).
So, as other answers say, the question is really moot, because I think that there are two important lessons here:
- Many different SQL statements can lead to the same execution plan, and a single SQL statement can also lead to multiple execution plans.
- You should write queries that express syntactically the result that you want, and not in general to preempt and constrain the query optimiser's choices.
The second point is important, as it argues against some developer's instinct to avoid writing correlated subqueries (and EXISTS or NOT EXISTS in particular it seems) in the hope of providing their own optimisation. In particular, replacing an EXISTS with an outer join can be very suboptimal given the right/wrong data distributions.
To get to the direct answer to the question, I'd say either:
- 0
- 1
- However many rows are in CUSTOMERS
- Possibly something else.
这篇关于嵌套查询的执行次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!