不存在vs不存在 [英] NOT IN vs NOT EXISTS

查看:164
本文介绍了不存在vs不存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

其中哪些查询速度更快?

Which of these queries is the faster?

不存在:

SELECT ProductID, ProductName 
FROM Northwind..Products p
WHERE NOT EXISTS (
    SELECT 1 
    FROM Northwind..[Order Details] od 
    WHERE p.ProductId = od.ProductId)

或不参加:

SELECT ProductID, ProductName 
FROM Northwind..Products p
WHERE p.ProductID NOT IN (
    SELECT ProductID 
    FROM Northwind..[Order Details])

查询执行计划表明它们都做相同的事情.如果是这种情况,建议使用哪种形式?

The query execution plan says they both do the same thing. If that is the case, which is the recommended form?

这是基于NorthWind数据库的.

This is based on the NorthWind database.

刚刚找到了这篇有用的文章: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

Just found this helpful article: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

我认为我会坚持不存在.

I think I'll stick with NOT EXISTS.

推荐答案

我始终默认为NOT EXISTS.

目前执行计划可能相同,但是如果将来更改任一列以允许NULL,则NOT IN版本将需要做更多的工作(即使实际上没有NULL (在数据中)和NOT IN的语义(如果存在NULL ),无论如何都不太可能是您想要的.

The execution plans may be the same at the moment but if either column is altered in the future to allow NULLs the NOT IN version will need to do more work (even if no NULLs are actually present in the data) and the semantics of NOT IN if NULLs are present are unlikely to be the ones you want anyway.

Products.ProductID[Order Details].ProductID都不允许NULL时,NOT IN将与以下查询相同.

When neither Products.ProductID or [Order Details].ProductID allow NULLs the NOT IN will be treated identically to the following query.

SELECT ProductID,
       ProductName
FROM   Products p
WHERE  NOT EXISTS (SELECT *
                   FROM   [Order Details] od
                   WHERE  p.ProductId = od.ProductId) 

确切的计划可能会有所不同,但对于我的示例数据,我得到以下信息.

The exact plan may vary but for my example data I get the following.

一个合理的普遍误解似乎是与联接相比,相关的子查询总是不好的".当他们强制执行嵌套循环计划(逐行评估子查询)时,肯定会出现这种情况,但是该计划包括反半联接逻辑运算符.反半联接不限于嵌套循环,还可以使用哈希联接或合并联接(如本例所示).

A reasonably common misconception seems to be that correlated sub queries are always "bad" compared to joins. They certainly can be when they force a nested loops plan (sub query evaluated row by row) but this plan includes an anti semi join logical operator. Anti semi joins are not restricted to nested loops but can use hash or merge (as in this example) joins too.

/*Not valid syntax but better reflects the plan*/ 
SELECT p.ProductID,
       p.ProductName
FROM   Products p
       LEFT ANTI SEMI JOIN [Order Details] od
         ON p.ProductId = od.ProductId 

如果[Order Details].ProductIDNULL可查询,则变为

SELECT ProductID,
       ProductName
FROM   Products p
WHERE  NOT EXISTS (SELECT *
                   FROM   [Order Details] od
                   WHERE  p.ProductId = od.ProductId)
       AND NOT EXISTS (SELECT *
                       FROM   [Order Details]
                       WHERE  ProductId IS NULL) 

这样做的原因是,如果[Order Details]包含任何NULL ProductId s的正确语义是不返回任何结果.请查看额外的反半连接和行计数假脱机,以验证是否已将其添加到计划中.

The reason for this is that the correct semantics if [Order Details] contains any NULL ProductIds is to return no results. See the extra anti semi join and row count spool to verify this that is added to the plan.

如果Products.ProductID也更改为变为NULL -able,则查询变为

If Products.ProductID is also changed to become NULL-able the query then becomes

SELECT ProductID,
       ProductName
FROM   Products p
WHERE  NOT EXISTS (SELECT *
                   FROM   [Order Details] od
                   WHERE  p.ProductId = od.ProductId)
       AND NOT EXISTS (SELECT *
                       FROM   [Order Details]
                       WHERE  ProductId IS NULL)
       AND NOT EXISTS (SELECT *
                       FROM   (SELECT TOP 1 *
                               FROM   [Order Details]) S
                       WHERE  p.ProductID IS NULL) 

之所以这样做,是因为如果NOT IN子查询根本不返回任何结果(即[Order Details]表为空).在这种情况下应该.在我的样本数据计划中,这是通过添加另一个反半联接来实现的,如下所示.

The reason for that one is because a NULL Products.ProductId should not be returned in the results except if the NOT IN sub query were to return no results at all (i.e. the [Order Details] table is empty). In which case it should. In the plan for my sample data this is implemented by adding another anti semi join as below.

The effect of this is shown in the blog post already linked by Buckley. In the example there the number of logical reads increase from around 400 to 500,000.

此外,单个NULL可以将行数减少到零的事实使基数估计非常困难.如果SQL Server假定会发生这种情况,但实际上数据中没有NULL行,则执行计划的其余部分可能会灾难性地恶化,如果这只是较大查询的一部分,则

Additionally the fact that a single NULL can reduce the row count to zero makes cardinality estimation very difficult. If SQL Server assumes that this will happen but in fact there were no NULL rows in the data the rest of the execution plan may be catastrophically worse, if this is just part of a larger query, with inappropriate nested loops causing repeated execution of an expensive sub tree for example.

但这不是NULL列上NOT IN唯一可行的执行计划. 本文显示了另一篇文章针对AdventureWorks2008数据库的查询.

This is not the only possible execution plan for a NOT IN on a NULL-able column however. This article shows another one for a query against the AdventureWorks2008 database.

对于NOT NULL列上的NOT IN或针对可为空或不可为空列的NOT EXISTS,它给出了以下计划.

For the NOT IN on a NOT NULL column or the NOT EXISTS against either a nullable or non nullable column it gives the following plan.

当列更改为NULL -able时,NOT IN计划现在看起来像

When the column changes to NULL-able the NOT IN plan now looks like

它为计划添加了一个额外的内部联接运算符. 在此说明.只需要将Sales.SalesOrderDetail.ProductID = <correlated_product_id>上的先前单个相关索引搜索转换为每个外行两个搜索.另一个在WHERE Sales.SalesOrderDetail.ProductID IS NULL上.

It adds an extra inner join operator to the plan. This apparatus is explained here. It is all there to convert the previous single correlated index seek on Sales.SalesOrderDetail.ProductID = <correlated_product_id> to two seeks per outer row. The additional one is on WHERE Sales.SalesOrderDetail.ProductID IS NULL.

由于这是在反半联接下,如果该联接返回任何行,则不会发生第二次查找.但是,如果Sales.SalesOrderDetail不包含任何NULL ProductID,它将使所需的查找操作次数增加一倍.

As this is under an anti semi join if that one returns any rows the second seek will not occur. However if Sales.SalesOrderDetail does not contain any NULL ProductIDs it will double the number of seek operations required.

这篇关于不存在vs不存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆