将数据集与 EXCEPT 结合与在 LEFT JOIN 中检查 IS NULL [英] Combining datasets with EXCEPT versus checking on IS NULL in a LEFT JOIN
问题描述
我目前正在通过Microsoft SQL Server 2008 - 数据库开发(MCTS 考试 70-433) 认证.在之前关于组合数据集的一章中,我遇到了EXCEPT
(和INTERSECT
)命令.一个示例展示了如何使用 EXCEPT
从一个表中获取所有值,而该表在第二个表中没有相关值,如下所示:
I'm currently working my way through the Microsoft SQL Server 2008 - Database Development (MCTS Exam 70-433) certification. In one of the earlier chapters on Combining Datasets, I came across the EXCEPT
(and INTERSECT
) commands. One example shows how to use EXCEPT
to get all values from one table that doesn't have a related value in a second table like this:
SELECT EmployeeKey FROM DimEmployee
EXCEPT
SELECT EmployeeKey FROM FactResellerSales
EXCEPT
命令对我来说是新的,但是根据我今天之前所知道的,我仍然可以使用 LEFT JOIN
轻松解决问题并检查 ISNULL
在 join 约束上的方式如下:
The EXCEPT
command was new to me, but with what I knew before today I would still easily solve the problem using a LEFT JOIN
and check for IS NULL
on the join constraint the following way:
SELECT DISTINCT DimEmployee.EmployeeKey FROM DimEmployee
LEFT JOIN FactResellerSales ON FactResellerSales.EmployeeKey = DimEmployee.EmployeeKey
WHERE FactResellerSales.EmployeeKey IS NULL
现在我开始怀疑其中哪一个的性能最好.我试图研究查询执行计划,但我不太擅长阅读这些,所以它并没有让我变得更聪明.对于使用 EXCEPT
的查询,计划如下所示:
Now I started to wonder which one of these have the best performance. I tried to study the query execution plans, but I'm not too good at reading those, so it didn't make me any wiser. For the query using EXCEPT
, the plan looks like this:
|--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey], [Expr1006]) WITH UNORDERED PREFETCH)
|--Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[DimEmployee].[IX_DimEmployee_SalesTerritoryKey]))
|--Top(TOP EXPRESSION:((1)))
|--Index Seek(OBJECT:([AdventureWorksDW2008].[dbo].[FactResellerSales].[IX_FactResellerSales_EmployeeKey]), SEEK:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]=[AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]) ORDERED FORWARD)
对于使用 LEFT JOIN
的那个,它看起来像这样:
And for the one using the LEFT JOIN
it looks like this:
|--Stream Aggregate(GROUP BY:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]))
|--Filter(WHERE:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey] IS NULL))
|--Merge Join(Left Outer Join, MERGE:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey])=([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]), RESIDUAL:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]=[AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]))
|--Clustered Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[DimEmployee].[PK_DimEmployee_EmployeeKey]), ORDERED FORWARD)
|--Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[FactResellerSales].[IX_FactResellerSales_EmployeeKey]), ORDERED FORWARD)
查询中使用的表来自 AdventureWorksDW2008 示例数据库,因此下面我还包含了两个表的创建脚本,以防需要正确回答问题:
The tables used in the query are from the AdventureWorksDW2008 example database, so below I also include the create scripts for the two tables in case that is needed to answer the question correctly:
USE [AdventureWorksDW2008]
GO
/****** Object: Table [dbo].[DimEmployee] Script Date: 11/22/2010 20:30:20 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[DimEmployee](
[EmployeeKey] [int] IDENTITY(1,1) NOT NULL,
[ParentEmployeeKey] [int] NULL,
[EmployeeNationalIDAlternateKey] [nvarchar](15) NULL,
[ParentEmployeeNationalIDAlternateKey] [nvarchar](15) NULL,
[SalesTerritoryKey] [int] NULL,
[FirstName] [nvarchar](50) NOT NULL,
[LastName] [nvarchar](50) NOT NULL,
[MiddleName] [nvarchar](50) NULL,
[NameStyle] [bit] NOT NULL,
[Title] [nvarchar](50) NULL,
[HireDate] [date] NULL,
[BirthDate] [date] NULL,
[LoginID] [nvarchar](256) NULL,
[EmailAddress] [nvarchar](50) NULL,
[Phone] [nvarchar](25) NULL,
[MaritalStatus] [nchar](1) NULL,
[EmergencyContactName] [nvarchar](50) NULL,
[EmergencyContactPhone] [nvarchar](25) NULL,
[SalariedFlag] [bit] NULL,
[Gender] [nchar](1) NULL,
[PayFrequency] [tinyint] NULL,
[BaseRate] [money] NULL,
[VacationHours] [smallint] NULL,
[SickLeaveHours] [smallint] NULL,
[CurrentFlag] [bit] NOT NULL,
[SalesPersonFlag] [bit] NOT NULL,
[DepartmentName] [nvarchar](50) NULL,
[StartDate] [date] NULL,
[EndDate] [date] NULL,
[Status] [nvarchar](50) NULL,
CONSTRAINT [PK_DimEmployee_EmployeeKey] PRIMARY KEY CLUSTERED
(
[EmployeeKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[DimEmployee] WITH CHECK ADD CONSTRAINT [FK_DimEmployee_DimEmployee] FOREIGN KEY([ParentEmployeeKey])
REFERENCES [dbo].[DimEmployee] ([EmployeeKey])
GO
ALTER TABLE [dbo].[DimEmployee] CHECK CONSTRAINT [FK_DimEmployee_DimEmployee]
GO
ALTER TABLE [dbo].[DimEmployee] WITH CHECK ADD CONSTRAINT [FK_DimEmployee_DimSalesTerritory] FOREIGN KEY([SalesTerritoryKey])
REFERENCES [dbo].[DimSalesTerritory] ([SalesTerritoryKey])
GO
ALTER TABLE [dbo].[DimEmployee] CHECK CONSTRAINT [FK_DimEmployee_DimSalesTerritory]
GO
第二个表:
USE [AdventureWorksDW2008]
GO
/****** Object: Table [dbo].[FactResellerSales] Script Date: 11/22/2010 20:30:47 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[FactResellerSales](
[ProductKey] [int] NOT NULL,
[OrderDateKey] [int] NOT NULL,
[DueDateKey] [int] NOT NULL,
[ShipDateKey] [int] NOT NULL,
[ResellerKey] [int] NOT NULL,
[EmployeeKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[SalesTerritoryKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [tinyint] NOT NULL,
[RevisionNumber] [tinyint] NULL,
[OrderQuantity] [smallint] NULL,
[UnitPrice] [money] NULL,
[ExtendedAmount] [money] NULL,
[UnitPriceDiscountPct] [float] NULL,
[DiscountAmount] [float] NULL,
[ProductStandardCost] [money] NULL,
[TotalProductCost] [money] NULL,
[SalesAmount] [money] NULL,
[TaxAmt] [money] NULL,
[Freight] [money] NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[CustomerPONumber] [nvarchar](25) NULL,
CONSTRAINT [PK_FactResellerSales_SalesOrderNumber_SalesOrderLineNumber] PRIMARY KEY CLUSTERED
(
[SalesOrderNumber] ASC,
[SalesOrderLineNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimCurrency] FOREIGN KEY([CurrencyKey])
REFERENCES [dbo].[DimCurrency] ([CurrencyKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimCurrency]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimDate] FOREIGN KEY([OrderDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimDate1] FOREIGN KEY([DueDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate1]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimDate2] FOREIGN KEY([ShipDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate2]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimEmployee] FOREIGN KEY([EmployeeKey])
REFERENCES [dbo].[DimEmployee] ([EmployeeKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimEmployee]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimProduct] FOREIGN KEY([ProductKey])
REFERENCES [dbo].[DimProduct] ([ProductKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimProduct]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimPromotion] FOREIGN KEY([PromotionKey])
REFERENCES [dbo].[DimPromotion] ([PromotionKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimPromotion]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimReseller] FOREIGN KEY([ResellerKey])
REFERENCES [dbo].[DimReseller] ([ResellerKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimReseller]
GO
ALTER TABLE [dbo].[FactResellerSales] WITH CHECK ADD CONSTRAINT [FK_FactResellerSales_DimSalesTerritory] FOREIGN KEY([SalesTerritoryKey])
REFERENCES [dbo].[DimSalesTerritory] ([SalesTerritoryKey])
GO
ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimSalesTerritory]
GO
这个问题使用了一个非常具体的例子,但我也对有关何时/是否适合使用 EXCEPT
而不是 LEFT JOIN
的一般信息感兴趣> 并检查 IS NULL
.
This question is using a very specific example, but I'm also interested in general information about when/if it would be appropriate to use EXCEPT
instead of a LEFT JOIN
and check on IS NULL
.
我还注意到,如果我尝试使用 INTERCEPT
运行第一个查询,则第二个查询中的等效项将使用标准的 JOIN
和 SELECTDISTINCT DimEmployee.EmployeeKey
(根本没有 WHERE
子句).然而,在这种情况下,两种情况下的执行计划完全相同.
I also noticed, that if I tried running the 1st query with INTERCEPT
, the equivalent in the 2nd query would be to use a standard JOIN
and SELECT DISTINCT DimEmployee.EmployeeKey
(and no WHERE
clause at all). In this case however, the execution plan was exactly the same in the two cases.
更新
对第二个查询的小更新(请参阅修订历史记录),这导致该查询的查询计划更加复杂.我猜更大的查询计划表示一个不太理想的查询,但我仍然希望回答这个问题.
Update
Minor update to the 2nd query (see revision history), which resulted in a bit more complex query plan for that one. I would guess that a bigger query plan indicates a less optimal query, but I'd still like this question to be answered.
推荐答案
在 LEFT JOIN 给出多行的情况下,这 2 个查询将有所不同.也就是说,FactResellerSales 是 DimEmployee 的子代,在 DimEmployee 中每行有许多行.因此,正如您在 JOIN 示例中指出的那样,您需要 DISTINCT.
The 2 queries will be different in cases where the LEFT JOIN gives multiple rows. That is, FactResellerSales is a child of DimEmployee with many rows per rows in DimEmployee. So you need DISTINCT as you noted for your JOIN example.
如果您将查询更改为使用 NOT EXISTS,您会得到相同的计划(左反半连接对于 NOT EXISTS 是典型的)
If you change the query to use NOT EXISTS you get the same plan (left anti semi join is typical for NOT EXISTS)
SELECT EmployeeKey
FROM DimEmployee DE
WHERE
NOT EXISTS (SELECT * FROM
FactResellerSales FRS
WHERE FRS.EmployeeKey = DE.EmployeeKey)
此外,出于同样的原因,INTERSECT/EXISTS 很可能会给出相同的计划.
In addition and for the same reason, INTERSECT/EXISTS will most likely give the same plan.
这是 JOIN/EXISTS/IN
或 OUTER JOIN/NOT EXISTS/NOT IN
争论的另一个方面.INTERSECT/EXCEPT 是一个稍微优雅的 (NOT) EXISTS 你喜欢的结构
It's another aspect to the JOIN/EXISTS/IN
or OUTER JOIN/NOT EXISTS/NOT IN
debates. INTERSECT/EXCEPT is a slightly more elegant construct of (NOT) EXISTS is you like
没有明显的问题...
就我个人而言,我不使用 OUTER JOIN 进行存在"检查:我使用 EXISTS 或 NOT EXISTS(如果我记得的话,或者使用 INTERSECT/EXCEPT),因为这更明显你想要做什么.AKA,如果我不需要外部"表中的行,我不会使用它来避免 DISTINCT.
Personally, I don't use OUTER JOIN for "existence" checking: I use EXISTS or NOT EXISTS (or INTERSECT/EXCEPT if I remember) because it's more obvious what you're trying to do. AKA, if I don't need rows from the "outer" table, I don't use it to avoid having a DISTINCT.
在这种情况下没有使用 OUTER JOIN/IS NULL 恕我直言 if 的情况.当然,我会在需要时使用 OUTER JOIN:此答案仅适用于一种特定情况.
There is no CASE to use OUTER JOIN/IS NULL IMHO if in this case. Of course, I use OUTER JOIN when needed: this answer is for one specific case only.
这篇关于将数据集与 EXCEPT 结合与在 LEFT JOIN 中检查 IS NULL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!