将数据集与 EXCEPT 结合与在 LEFT JOIN 中检查 IS NULL [英] Combining datasets with EXCEPT versus checking on IS NULL in a LEFT JOIN

查看:27
本文介绍了将数据集与 EXCEPT 结合与在 LEFT JOIN 中检查 IS NULL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在通过Microsoft SQL Server 2008 - 数据库开发(MCTS 考试 70-433) 认证.在之前关于组合数据集的一章中,我遇到了EXCEPT(和INTERSECT)命令.一个示例展示了如何使用 EXCEPT 从一个表中获取所有值,而该表在第二个表中没有相关值,如下所示:

I'm currently working my way through the Microsoft SQL Server 2008 - Database Development (MCTS Exam 70-433) certification. In one of the earlier chapters on Combining Datasets, I came across the EXCEPT (and INTERSECT) commands. One example shows how to use EXCEPT to get all values from one table that doesn't have a related value in a second table like this:

SELECT EmployeeKey FROM DimEmployee
EXCEPT
SELECT EmployeeKey FROM FactResellerSales

EXCEPT 命令对我来说是新的,但是根据我今天之前所知道的,我仍然可以使用 LEFT JOIN 轻松解决问题并检查 ISNULL 在 join 约束上的方式如下:

The EXCEPT command was new to me, but with what I knew before today I would still easily solve the problem using a LEFT JOIN and check for IS NULL on the join constraint the following way:

SELECT DISTINCT DimEmployee.EmployeeKey FROM DimEmployee
LEFT JOIN FactResellerSales ON FactResellerSales.EmployeeKey = DimEmployee.EmployeeKey
WHERE FactResellerSales.EmployeeKey IS NULL

现在我开始怀疑其中哪一个的性能最好.我试图研究查询执行计划,但我不太擅长阅读这些,所以它并没有让我变得更聪明.对于使用 EXCEPT 的查询,计划如下所示:

Now I started to wonder which one of these have the best performance. I tried to study the query execution plans, but I'm not too good at reading those, so it didn't make me any wiser. For the query using EXCEPT, the plan looks like this:

|--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey], [Expr1006]) WITH UNORDERED PREFETCH)
    |--Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[DimEmployee].[IX_DimEmployee_SalesTerritoryKey]))
    |--Top(TOP EXPRESSION:((1)))
        |--Index Seek(OBJECT:([AdventureWorksDW2008].[dbo].[FactResellerSales].[IX_FactResellerSales_EmployeeKey]), SEEK:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]=[AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]) ORDERED FORWARD)

对于使用 LEFT JOIN 的那个,它看起来像这样:

And for the one using the LEFT JOIN it looks like this:

|--Stream Aggregate(GROUP BY:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]))
    |--Filter(WHERE:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey] IS NULL))
        |--Merge Join(Left Outer Join, MERGE:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey])=([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]), RESIDUAL:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]=[AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]))
            |--Clustered Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[DimEmployee].[PK_DimEmployee_EmployeeKey]), ORDERED FORWARD)
            |--Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[FactResellerSales].[IX_FactResellerSales_EmployeeKey]), ORDERED FORWARD)

查询中使用的表来自 AdventureWorksDW2008 示例数据库,因此下面我还包含了两个表的创建脚本,以防需要正确回答问题:

The tables used in the query are from the AdventureWorksDW2008 example database, so below I also include the create scripts for the two tables in case that is needed to answer the question correctly:

USE [AdventureWorksDW2008]
GO

/****** Object:  Table [dbo].[DimEmployee]    Script Date: 11/22/2010 20:30:20 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[DimEmployee](
    [EmployeeKey] [int] IDENTITY(1,1) NOT NULL,
    [ParentEmployeeKey] [int] NULL,
    [EmployeeNationalIDAlternateKey] [nvarchar](15) NULL,
    [ParentEmployeeNationalIDAlternateKey] [nvarchar](15) NULL,
    [SalesTerritoryKey] [int] NULL,
    [FirstName] [nvarchar](50) NOT NULL,
    [LastName] [nvarchar](50) NOT NULL,
    [MiddleName] [nvarchar](50) NULL,
    [NameStyle] [bit] NOT NULL,
    [Title] [nvarchar](50) NULL,
    [HireDate] [date] NULL,
    [BirthDate] [date] NULL,
    [LoginID] [nvarchar](256) NULL,
    [EmailAddress] [nvarchar](50) NULL,
    [Phone] [nvarchar](25) NULL,
    [MaritalStatus] [nchar](1) NULL,
    [EmergencyContactName] [nvarchar](50) NULL,
    [EmergencyContactPhone] [nvarchar](25) NULL,
    [SalariedFlag] [bit] NULL,
    [Gender] [nchar](1) NULL,
    [PayFrequency] [tinyint] NULL,
    [BaseRate] [money] NULL,
    [VacationHours] [smallint] NULL,
    [SickLeaveHours] [smallint] NULL,
    [CurrentFlag] [bit] NOT NULL,
    [SalesPersonFlag] [bit] NOT NULL,
    [DepartmentName] [nvarchar](50) NULL,
    [StartDate] [date] NULL,
    [EndDate] [date] NULL,
    [Status] [nvarchar](50) NULL,
 CONSTRAINT [PK_DimEmployee_EmployeeKey] PRIMARY KEY CLUSTERED 
(
    [EmployeeKey] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[DimEmployee]  WITH CHECK ADD  CONSTRAINT [FK_DimEmployee_DimEmployee] FOREIGN KEY([ParentEmployeeKey])
REFERENCES [dbo].[DimEmployee] ([EmployeeKey])
GO

ALTER TABLE [dbo].[DimEmployee] CHECK CONSTRAINT [FK_DimEmployee_DimEmployee]
GO

ALTER TABLE [dbo].[DimEmployee]  WITH CHECK ADD  CONSTRAINT [FK_DimEmployee_DimSalesTerritory] FOREIGN KEY([SalesTerritoryKey])
REFERENCES [dbo].[DimSalesTerritory] ([SalesTerritoryKey])
GO

ALTER TABLE [dbo].[DimEmployee] CHECK CONSTRAINT [FK_DimEmployee_DimSalesTerritory]
GO

第二个表:

USE [AdventureWorksDW2008]
GO

/****** Object:  Table [dbo].[FactResellerSales]    Script Date: 11/22/2010 20:30:47 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[FactResellerSales](
    [ProductKey] [int] NOT NULL,
    [OrderDateKey] [int] NOT NULL,
    [DueDateKey] [int] NOT NULL,
    [ShipDateKey] [int] NOT NULL,
    [ResellerKey] [int] NOT NULL,
    [EmployeeKey] [int] NOT NULL,
    [PromotionKey] [int] NOT NULL,
    [CurrencyKey] [int] NOT NULL,
    [SalesTerritoryKey] [int] NOT NULL,
    [SalesOrderNumber] [nvarchar](20) NOT NULL,
    [SalesOrderLineNumber] [tinyint] NOT NULL,
    [RevisionNumber] [tinyint] NULL,
    [OrderQuantity] [smallint] NULL,
    [UnitPrice] [money] NULL,
    [ExtendedAmount] [money] NULL,
    [UnitPriceDiscountPct] [float] NULL,
    [DiscountAmount] [float] NULL,
    [ProductStandardCost] [money] NULL,
    [TotalProductCost] [money] NULL,
    [SalesAmount] [money] NULL,
    [TaxAmt] [money] NULL,
    [Freight] [money] NULL,
    [CarrierTrackingNumber] [nvarchar](25) NULL,
    [CustomerPONumber] [nvarchar](25) NULL,
 CONSTRAINT [PK_FactResellerSales_SalesOrderNumber_SalesOrderLineNumber] PRIMARY KEY CLUSTERED 
(
    [SalesOrderNumber] ASC,
    [SalesOrderLineNumber] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimCurrency] FOREIGN KEY([CurrencyKey])
REFERENCES [dbo].[DimCurrency] ([CurrencyKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimCurrency]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimDate] FOREIGN KEY([OrderDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimDate1] FOREIGN KEY([DueDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate1]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimDate2] FOREIGN KEY([ShipDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate2]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimEmployee] FOREIGN KEY([EmployeeKey])
REFERENCES [dbo].[DimEmployee] ([EmployeeKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimEmployee]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimProduct] FOREIGN KEY([ProductKey])
REFERENCES [dbo].[DimProduct] ([ProductKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimProduct]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimPromotion] FOREIGN KEY([PromotionKey])
REFERENCES [dbo].[DimPromotion] ([PromotionKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimPromotion]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimReseller] FOREIGN KEY([ResellerKey])
REFERENCES [dbo].[DimReseller] ([ResellerKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimReseller]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimSalesTerritory] FOREIGN KEY([SalesTerritoryKey])
REFERENCES [dbo].[DimSalesTerritory] ([SalesTerritoryKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimSalesTerritory]
GO

这个问题使用了一个非常具体的例子,但我也对有关何时/是否适合使用 EXCEPT 而不是 LEFT JOIN 的一般信息感兴趣> 并检查 IS NULL.

This question is using a very specific example, but I'm also interested in general information about when/if it would be appropriate to use EXCEPT instead of a LEFT JOIN and check on IS NULL.

我还注意到,如果我尝试使用 INTERCEPT 运行第一个查询,则第二个查询中的等效项将使用标准的 JOINSELECTDISTINCT DimEmployee.EmployeeKey(根本没有 WHERE 子句).然而,在这种情况下,两种情况下的执行计划完全相同.

I also noticed, that if I tried running the 1st query with INTERCEPT, the equivalent in the 2nd query would be to use a standard JOIN and SELECT DISTINCT DimEmployee.EmployeeKey (and no WHERE clause at all). In this case however, the execution plan was exactly the same in the two cases.

更新
对第二个查询的小更新(请参阅修订历史记录),这导致该查询的查询计划更加复杂.我猜更大的查询计划表示一个不太理想的查询,但我仍然希望回答这个问题.

Update
Minor update to the 2nd query (see revision history), which resulted in a bit more complex query plan for that one. I would guess that a bigger query plan indicates a less optimal query, but I'd still like this question to be answered.

推荐答案

在 LEFT JOIN 给出多行的情况下,这 2 个查询将有所不同.也就是说,FactResellerSales 是 DimEmployee 的子代,在 DimEmployee 中每行有许多行.因此,正如您在 JOIN 示例中指出的那样,您需要 DISTINCT.

The 2 queries will be different in cases where the LEFT JOIN gives multiple rows. That is, FactResellerSales is a child of DimEmployee with many rows per rows in DimEmployee. So you need DISTINCT as you noted for your JOIN example.

如果您将查询更改为使用 NOT EXISTS,您会得到相同的计划(左反半连接对于 NOT EXISTS 是典型的)

If you change the query to use NOT EXISTS you get the same plan (left anti semi join is typical for NOT EXISTS)

SELECT EmployeeKey
FROM DimEmployee DE 
WHERE
NOT EXISTS (SELECT * FROM 
        FactResellerSales FRS
    WHERE FRS.EmployeeKey = DE.EmployeeKey)

此外,出于同样的原因,INTERSECT/EXISTS 很可能会给出相同的计划.

In addition and for the same reason, INTERSECT/EXISTS will most likely give the same plan.

这是 JOIN/EXISTS/INOUTER JOIN/NOT EXISTS/NOT IN 争论的另一个方面.INTERSECT/EXCEPT 是一个稍微优雅的 (NOT) EXISTS 你喜欢的结构

It's another aspect to the JOIN/EXISTS/IN or OUTER JOIN/NOT EXISTS/NOT IN debates. INTERSECT/EXCEPT is a slightly more elegant construct of (NOT) EXISTS is you like

没有明显的问题...

就我个人而言,我不使用 OUTER JOIN 进行存在"检查:我使用 EXISTS 或 NOT EXISTS(如果我记得的话,或者使用 INTERSECT/EXCEPT),因为这更明显你想要做什么.AKA,如果我不需要外部"表中的行,我不会使用它来避免 DISTINCT.

Personally, I don't use OUTER JOIN for "existence" checking: I use EXISTS or NOT EXISTS (or INTERSECT/EXCEPT if I remember) because it's more obvious what you're trying to do. AKA, if I don't need rows from the "outer" table, I don't use it to avoid having a DISTINCT.

在这种情况下没有使用 OUTER JOIN/IS NULL 恕我直言 if 的情况.当然,我会在需要时使用 OUTER JOIN:此答案仅适用于一种特定情况.

There is no CASE to use OUTER JOIN/IS NULL IMHO if in this case. Of course, I use OUTER JOIN when needed: this answer is for one specific case only.

这篇关于将数据集与 EXCEPT 结合与在 LEFT JOIN 中检查 IS NULL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆