SQL - 不存在具有数百万条记录的查询 [英] SQL - not exists query with millions of records

查看:26
本文介绍了SQL - 不存在具有数百万条记录的查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下 SQL 查询(在 SAS 中)从 pool1 中查找 pool2 中不存在的任何记录.Pool1 有 11,000,000 条记录,pool2 有 700,000 条记录.这是我遇到问题的地方.我让查询运行了 16 个小时,但离完成还差得很远.有没有更有效的方法(在 SQL 或 SAS 中)来实现我需要做的事情?

I'm trying to use the following SQL query (in SAS) to find any records from pool1 that do not exist in pool2. Pool1 has 11,000,000 records, pool2 has 700,000. This is where I run into an issue. I let the query run for 16 hours and it was nowhere near finishing. Is there a more efficient way (in SQL or SAS) to achieve what I need to do?

PROC SQL;
CREATE TABLE ALL AS
SELECT A.ID
FROM POOL1 A
WHERE NOT EXISTS (SELECT B.ID
                  FROM POOL2 B
                  WHERE B.ID = A.ID);
QUIT;

推荐答案

PROC SQL;
CREATE TABLE ALL AS
SELECT A.ID
    FROM
        POOL1 A
    WHERE A.ID NOT IN (SELECT B.ID
                        FROM
                            POOL2 B)
                        ;

上述更改应该返回相同的结果集,但运行时间要少得多,因为您不是尝试将 POOL2 加入回 POOL1,而是简单地排除 POOL2 中存在的结果.

The above change should return the same result set but take considerably less time to run as you are not trying to join POOL2 back to POOL1 but simply excluding results which exist in POOL2.

正如另一个答案中所述,INDEX 可能会有所帮助,但如果 ID 字段是主键,则它们很可能已经在 INDEX 中受到影响.

As stated in another answer, an INDEX may help but if the ID fields are the primary keys it is likely they are already subject to in INDEX.

这篇关于SQL - 不存在具有数百万条记录的查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆