JOIN 与 EXISTS 性能 [英] JOIN versus EXISTS performance

查看:25
本文介绍了JOIN 与 EXISTS 性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一般来说,使用 JOIN 选择行与使用 EXISTS where 子句之间是否存在性能差异?搜索各种问答网站表明联接更有效,但我记得很久以前了解到 Teradata 中的 EXISTS 更好.

Generally speaking, is there a performance difference between using a JOIN to select rows versus an EXISTS where clause? Searching various Q&A web sites suggests that a join is more efficient, but I recall learning a long time ago that EXISTS was better in Teradata.

我确实看到其他 SO 答案,例如 this这个,但我的问题是针对 Teradata 的.

I do see other SO answers, like this and this, but my question is specific to Teradata.

例如,考虑这两个返回相同结果的查询:

For example, consider these two queries, which return identical results:

select   svc.ltv_scr, count(*) as freq
from     MY_BASE_TABLE svc
join     MY_TARGET_TABLE x
on       x.srv_accs_id=svc.srv_accs_id
group by 1
order by 1

-和-

select   svc.ltv_scr, count(*) as freq
from     MY_BASE_TABLE svc
where exists(
    select 1
    from   MY_TARGET_TABLE x
    where  x.srv_accs_id=svc.srv_accs_id)
group by 1
order by 1

两个表上的主索引(唯一)是srv_accs_id".MY_BASE_TABLE 相当大(2 亿行),而 MY_TARGET_TABLE 相对较小(200,000 行).

The primary index (unique) on both tables is 'srv_accs_id'. MY_BASE_TABLE is rather large (200 million rows) and MY_TARGET_TABLE relatively small (200,000 rows).

EXPLAIN 计划有一个显着的区别:第一个说两个表是通过 RowHash 匹配扫描"连接的,第二个说通过所有-行扫描".两者都说它是一个全 AMP 连接步骤",并且总估计时间相同(0.32 秒).

There is one significant difference in the EXPLAIN plans: The first says the two tables are joined "by way of a RowHash match scan" and the second says "by way of an all-rows scan". Both say it is "an all-AMPs JOIN step" and the total estimated time is identical (0.32 seconds).

两个查询的执行相同(我使用的是 Teradata 13.10).

Both queries perform the same (I'm using Teradata 13.10).

将 LEFT OUTER JOIN 与相应的 IS NULL where 子句与 NOT EXISTS 子查询进行比较的类似发现不匹配的实验确实显示了性能差异:

A similar experiment to find non-matches comparing a LEFT OUTER JOIN with a corresponding IS NULL where clause to a NOT EXISTS sub-query does show a performance difference:

select   svc.ltv_scr, count(*) as freq
from     MY_BASE_TABLE svc
left outer join MY_TARGET_TABLE x
on       x.srv_accs_id=svc.srv_accs_id
where    x.srv_accs_id is null
group by 1
order by 1

-和-

select   svc.ltv_scr, count(*) as freq
from     MY_BASE_TABLE svc
where not exists(
    select 1
    from   MY_TARGET_TABLE x
    where  x.srv_accs_id=svc.srv_accs_id)
group by 1
order by 1 

第二个查询计划更快(如 EXPLAIN 所述,为 2.21 秒与 2.14 秒).

The second query plan is faster (2.21 versus 2.14 seconds as described by EXPLAIN).

我的例子可能太琐碎,看不出区别;我只是在寻找编码指南.

My example may be too trivial to see a difference; I'm just looking for coding guidance.

推荐答案

NOT EXISTS 比使用 LEFT OUTER JOIN 从使用 IS NULL 条件的参与表中排除丢失的记录更有效,因为优化器将选择使用带有 NOT EXISTS 谓词的 EXCLUSION MERGE JOIN.

NOT EXISTS is more efficient than using a LEFT OUTER JOIN to exclude records that are missing from the participating table using an IS NULL condition because the optimizer will elect to use an EXCLUSION MERGE JOIN with the NOT EXISTS predicate.

虽然您的第二次测试没有为您使用的数据集产生令人印象深刻的结果,但随着数据量的增加,NOT EXISTS 对 LEFT JOIN 的性能提升非常明显.请记住,表将需要由参与 NOT EXISTS 连接的列进行散列分布,就像它们在 LEFT JOIN 中一样.因此,数据倾斜会影响 EXCLUSION MERGE JOIN 的性能.

While your second test did not yield impressive results for the data sets you were using the performance increase from NOT EXISTS over a LEFT JOIN is very noticeable as your data volumes increase. Keep in mind that the tables will need to be hash distributed by the columns that participate in the NOT EXISTS join just like they would in the LEFT JOIN. Therefore, data skew can impact the performance of the EXCLUSION MERGE JOIN.

通常,我会使用 EXISTS 作为 IN 的替代品,而不是使用它来重新编写连接解决方​​案.当参与逻辑比较的列可以为 NULL 时尤其如此.这并不是说您不能使用 EXISTS 代替 INNER JOIN.您最终会得到一个 INCLUSION JOIN 而不是 EXCLUSION JOIN.INNER JOIN 本质上是一个包含连接.我确定我忽略了一些细微差别,但如果您愿意花时间阅读它们,可以在手册中找到它们.

Typically, I would defer to EXISTS as a replacement for IN instead of using it for re-writing a join solution. This is especially true when the column(s) participating in the logical comparison can be NULL. That's not to say you couldn't use EXISTS in place of an INNER JOIN. Instead of an EXCLUSION JOIN you will end up with an INCLUSION JOIN. The INNER JOIN is in essence an inclusion join to begin with. I'm sure there are some nuances that I am overlooking but you can find those in the manuals if you wish to take the time to read them.

这篇关于JOIN 与 EXISTS 性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆