SQL-不存在具有数百万条记录的查询 [英] SQL - not exists query with millions of records
问题描述
我正在尝试使用以下SQL查询(在SAS中)从pool1
查找在pool2
中不存在的任何记录. Pool1
具有11,000,000条记录,pool2
具有700,000条记录.这是我遇到的一个问题.我让查询运行了16个小时,而且还远远没有完成.是否有更有效的方法(在SQL或SAS中)实现我需要做的事情?
I'm trying to use the following SQL query (in SAS) to find any records from pool1
that do not exist in pool2
. Pool1
has 11,000,000 records, pool2
has 700,000. This is where I run into an issue. I let the query run for 16 hours and it was nowhere near finishing. Is there a more efficient way (in SQL or SAS) to achieve what I need to do?
PROC SQL;
CREATE TABLE ALL AS
SELECT A.ID
FROM POOL1 A
WHERE NOT EXISTS (SELECT B.ID
FROM POOL2 B
WHERE B.ID = A.ID);
QUIT;
推荐答案
PROC SQL;
CREATE TABLE ALL AS
SELECT A.ID
FROM
POOL1 A
WHERE A.ID NOT IN (SELECT B.ID
FROM
POOL2 B)
;
上面的更改应该返回相同的结果集,但是运行时间要少得多,因为您没有尝试将POOL2重新加入POOL1,而只是排除了POOL2中存在的结果.
The above change should return the same result set but take considerably less time to run as you are not trying to join POOL2 back to POOL1 but simply excluding results which exist in POOL2.
如另一个答案中所述,INDEX可能会有所帮助,但是如果ID字段是主键,则很可能已经在INDEX中对其进行了约束.
As stated in another answer, an INDEX may help but if the ID fields are the primary keys it is likely they are already subject to in INDEX.
这篇关于SQL-不存在具有数百万条记录的查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!