SQL-不存在具有数百万条记录的查询 [英] SQL - not exists query with millions of records

查看:126
本文介绍了SQL-不存在具有数百万条记录的查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下SQL查询(在SAS中)从pool1查找在pool2中不存在的任何记录. Pool1具有11,000,000条记录,pool2具有700,000条记录.这是我遇到的一个问题.我让查询运行了16个小时,而且还远远没有完成.是否有更有效的方法(在SQL或SAS中)实现我需要做的事情?

I'm trying to use the following SQL query (in SAS) to find any records from pool1 that do not exist in pool2. Pool1 has 11,000,000 records, pool2 has 700,000. This is where I run into an issue. I let the query run for 16 hours and it was nowhere near finishing. Is there a more efficient way (in SQL or SAS) to achieve what I need to do?

PROC SQL;
CREATE TABLE ALL AS
SELECT A.ID
FROM POOL1 A
WHERE NOT EXISTS (SELECT B.ID
                  FROM POOL2 B
                  WHERE B.ID = A.ID);
QUIT;

推荐答案

PROC SQL;
CREATE TABLE ALL AS
SELECT A.ID
    FROM
        POOL1 A
    WHERE A.ID NOT IN (SELECT B.ID
                        FROM
                            POOL2 B)
                        ;

上面的更改应该返回相同的结果集,但是运行时间要少得多,因为您没有尝试将POOL2重新加入POOL1,而只是排除了POOL2中存在的结果.

The above change should return the same result set but take considerably less time to run as you are not trying to join POOL2 back to POOL1 but simply excluding results which exist in POOL2.

如另一个答案中所述,INDEX可能会有所帮助,但是如果ID字段是主键,则很可能已经在INDEX中对其进行了约束.

As stated in another answer, an INDEX may help but if the ID fields are the primary keys it is likely they are already subject to in INDEX.

这篇关于SQL-不存在具有数百万条记录的查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆