PostgreSQL:如何使用generate_series()找出一列中缺少的数字? [英] PostgreSQL: How to figure out missing numbers in a column using generate_series()?

查看:226
本文介绍了PostgreSQL:如何使用generate_series()找出一列中缺少的数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SELECT commandid 
FROM results 
WHERE NOT EXISTS (
    SELECT * 
    FROM generate_series(0,119999) 
    WHERE generate_series = results.commandid 
    );

我在结果中有一个列 int ,但各种测试均失败,因此未添加到表中。我想创建一个查询,该查询返回在结果中找不到的 commandid 列表。我以为上面的查询会做我想要的。但是,如果我使用的范围超出 commandid 的可能范围(如负数),它甚至将不起作用。

I have a column in results of type int but various tests failed and were not added to the table. I would like to create a query that returns a list of commandid that are not found in results. I thought the above query would do what I wanted. However, it does not even work if I use a range that is outside the expected possible range of commandid (like negative numbers).

推荐答案

给出示例数据:

create table results ( commandid integer primary key);
insert into results (commandid) select * from generate_series(1,1000);
delete from results where random() < 0.20;

工作原理:

SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
WHERE NOT EXISTS (SELECT 1 FROM results WHERE commandid = s.i);

此替代公式也是如此:

SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
LEFT OUTER JOIN results ON (results.commandid = s.i) 
WHERE results.commandid IS NULL;

以上两者在我的测试中似乎导致相同的查询计划,但您应该与您的比较 EXPLAIN ANALYZE 查看数据库中的最佳数据。

Both of the above appear to result in identical query plans in my tests, but you should compare with your data on your database using EXPLAIN ANALYZE to see which is best.

请注意,我使用 NOT EXISTS 代替了 NOT IN 公式,另外一个是普通的外部联接。数据库服务器优化这些内容要容易得多,并且避免了 NOT IN NULL s可能引起的混乱问题。 >。

Note that instead of NOT IN I've used NOT EXISTS with a subquery in one formulation, and an ordinary OUTER JOIN in the other. It's much easier for the DB server to optimise these and it avoids the confusing issues that can arise with NULLs in NOT IN.

我最初倾向于使用外部联接公式,但至少在9.1中,我的测试数据为不存在表单针对同一计划进行了优化。

I initially favoured the OUTER JOIN formulation, but at least in 9.1 with my test data the NOT EXISTS form optimizes to the same plan.

两者的效果都优于 NOT IN 公式如下,具体取决于您的情况。 NOT IN 过去一直要求Pg对要测试的每个元组进行 IN 列表的线性搜索,但要检查查询计划表明Pg可能足够聪明,可以立即对其进行哈希处理。 NOT EXISTS (由查询计划者转换为 JOIN )和 JOIN 工作得更好。

Both will perform better than the NOT IN formulation below when the series is large, as in your case. NOT IN used to require Pg to do a linear search of the IN list for every tuple being tested, but examination of the query plan suggests Pg may be smart enough to hash it now. The NOT EXISTS (transformed into a JOIN by the query planner) and the JOIN work better.

在存在NULL <$的情况下, NOT IN 公式都令人困惑c $ c> commandid s并可能效率低下:

The NOT IN formulation is both confusing in the presence of NULL commandids and can be inefficient:

SELECT s.i AS missing_cmd
FROM generate_series(0,1000) s(i)
WHERE s.i NOT IN (SELECT commandid FROM results);

所以我避免了。有了1,000,000行,其他两个行在1.2秒内完成,并且 NOT IN 的公式受到CPU的限制,直到我感到无聊并取消为止。

so I'd avoid it. With 1,000,000 rows the other two completed in 1.2 seconds and the NOT IN formulation ran CPU-bound until I got bored and cancelled it.

这篇关于PostgreSQL:如何使用generate_series()找出一列中缺少的数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆