大行集的IN与JOIN [英] IN vs. JOIN with large rowsets

查看:107
本文介绍了大行集的IN与JOIN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想选择一个表中的行,其中主键位于另一个表中.我不确定在SQL Server 2005中是否应该使用JOIN或IN运算符.在具有大型数据集(即数百万行)的这两个SQL查询之间,是否存在明显的性能差异?

I'm wanting to select rows in a table where the primary key is in another table. I'm not sure if I should use a JOIN or the IN operator in SQL Server 2005. Is there any significant performance difference between these two SQL queries with a large dataset (i.e. millions of rows)?

SELECT *
FROM a
WHERE a.c IN (SELECT d FROM b)

SELECT a.*
FROM a JOIN b ON a.c = b.d

推荐答案

更新:

我博客中的这篇文章总结了我的答案和对其他答案的评论,并显示了实际的执行计划:

This article in my blog summarizes both my answer and my comments to another answers, and shows actual execution plans:

SELECT  *
FROM    a
WHERE   a.c IN (SELECT d FROM b)

SELECT  a.*
FROM    a
JOIN    b
ON      a.c = b.d

这些查询不相同.如果您的表b不是键保存的(即b.d的值不是唯一的),它们会产生不同的结果.

These queries are not equivalent. They can yield different results if your table b is not key preserved (i. e. the values of b.d are not unique).

第一个查询的等效项如下:

The equivalent of the first query is the following:

SELECT  a.*
FROM    a
JOIN    (
        SELECT  DISTINCT d
        FROM    b
        ) bo
ON      a.c = bo.d

如果b.dUNIQUE并标记为这样(用UNIQUE INDEXUNIQUE CONSTRAINT),则这些查询是相同的,并且很可能将使用相同的计划,因为SQL Server足够聪明,可以执行此操作考虑在内.

If b.d is UNIQUE and marked as such (with a UNIQUE INDEX or UNIQUE CONSTRAINT), then these queries are identical and most probably will use identical plans, since SQL Server is smart enough to take this into account.

SQL Server可以采用以下方法之一来运行此查询:

SQL Server can employ one of the following methods to run this query:

  • 如果a.c上有索引,dUNIQUE,并且ba相比较小,则该条件将传播到子查询和普通INNER JOIN用于(以b开头)

  • If there is an index on a.c, d is UNIQUE and b is relatively small compared to a, then the condition is propagated into the subquery and the plain INNER JOIN is used (with b leading)

如果b.d上有索引,而d不是UNIQUE,则该条件也会传播,并使用LEFT SEMI JOIN.也可以用于上述条件.

If there is an index on b.d and d is not UNIQUE, then the condition is also propagated and LEFT SEMI JOIN is used. It can also be used for the condition above.

如果b.da.c上都有索引并且它们很大,则使用MERGE SEMI JOIN

If there is an index on both b.d and a.c and they are large, then MERGE SEMI JOIN is used

如果任何表上都没有索引,则在b上构建哈希表,并使用HASH SEMI JOIN.

If there is no index on any table, then a hash table is built on b and HASH SEMI JOIN is used.

都不是每次都会重新评估整个子查询.

Neither of these methods reevaluates the whole subquery each time.

有关其工作原理的更多详细信息,请参见我的博客中的条目:

See this entry in my blog for more detail on how this works:

四大公司的所有RDBMS都有链接.

There are links for all RDBMS's of the big four.

这篇关于大行集的IN与JOIN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆