PostgreSQL：NOT IN与EXCEPT的性能差异（编辑＃2） [英] PostgreSQL: NOT IN versus EXCEPT performance difference (edited #2)

查看：104 发布时间：2020/5/29 20:08:13 sql postgresql

本文介绍了PostgreSQL：NOT IN与EXCEPT的性能差异（编辑＃2）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个功能相同的查询。其中一个的表现非常好，另一个则表现很差。我看不到性能差异的来源。

I have two queries that are functionally identical. One of them performs very well, the other one performs very poorly. I do not see from where the performance difference arises.

查询＃1：

SELECT id 
FROM subsource_position
WHERE
  id NOT IN (SELECT position_id FROM subsource)

这将返回以下计划：

                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Seq Scan on subsource_position  (cost=0.00..362486535.10 rows=128524 width=4)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Materialize  (cost=0.00..2566.50 rows=101500 width=4)
           ->  Seq Scan on subsource  (cost=0.00..1662.00 rows=101500 width=4)

查询＃2 ：

SELECT id FROM subsource_position
EXCEPT
SELECT position_id FROM subsource;

计划：

                                           QUERY PLAN                                            
-------------------------------------------------------------------------------------------------
 SetOp Except  (cost=24760.35..25668.66 rows=95997 width=4)
   ->  Sort  (cost=24760.35..25214.50 rows=181663 width=4)
         Sort Key: "*SELECT* 1".id
         ->  Append  (cost=0.00..6406.26 rows=181663 width=4)
               ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..4146.94 rows=95997 width=4)
                     ->  Seq Scan on subsource_position  (cost=0.00..3186.97 rows=95997 width=4)
               ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..2259.32 rows=85666 width=4)
                     ->  Seq Scan on subsource  (cost=0.00..1402.66 rows=85666 width=4)
(8 rows)

我感觉我丢失了一个查询中明显不好的东西，或者我错误地配置了PostgreSQL服务器。我本来希望 NOT IN 可以很好地进行优化；是不输入始终是性能问题，还是有没有在此处进行优化的原因？

I have a feeling I'm missing either something obviously bad about one of my queries, or I have misconfigured the PostgreSQL server. I would have expected this NOT IN to optimize well; is NOT IN always a performance problem or is there a reason it does not optimize here?

其他数据：

=> select count(*) from subsource;
 count 
-------
 85158
(1 row)

=> select count(*) from subsource_position;
 count 
-------
 93261
(1 row)

编辑：我现在解决了下面提到的AB！= BA问题。但是我所说的问题仍然存在：查询1仍然比查询2严重得多。我相信，这是由于两个表的行数相似。

Edit: I have now fixed the A-B != B-A problem mentioned below. But my problem as stated still exists: query #1 is still massively worse than query #2. This, I believe, follows from the fact that both tables have similar numbers of rows.

编辑2 ：我使用的是PostgresQL 9.0。 4。我无法使用EXPLAIN ANALYZE，因为查询＃1花费的时间太长。所有这些列都不是NULL，因此应该没有任何区别。

Edit 2: I'm using PostgresQL 9.0.4. I cannot use EXPLAIN ANALYZE because query #1 takes too long. All of these columns are NOT NULL, so there should be no difference as a result of that.

编辑3 ：我有一个索引这两个列。我尚未完成查询＃1（约10分钟后放弃）。查询＃2立即返回。

Edit 3: I have an index on both these columns. I haven't yet gotten query #1 to complete (gave up after ~10 minutes). Query #2 returns immediately.

PostgreSQL：NOT IN与EXCEPT的性能差异（编辑＃2） [英] PostgreSQL: NOT IN versus EXCEPT performance difference (edited #2)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

PostgreSQL：NOT IN与EXCEPT的性能差异（编辑＃2） [英] PostgreSQL: NOT IN versus EXCEPT performance difference (edited #2)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭