PostgreSQL:NOT IN与EXCEPT的性能差异(编辑#2) [英] PostgreSQL: NOT IN versus EXCEPT performance difference (edited #2)
问题描述
我有两个功能相同的查询。其中一个的表现非常好,另一个则表现很差。我看不到性能差异的来源。
I have two queries that are functionally identical. One of them performs very well, the other one performs very poorly. I do not see from where the performance difference arises.
查询#1:
SELECT id
FROM subsource_position
WHERE
id NOT IN (SELECT position_id FROM subsource)
这将返回以下计划:
QUERY PLAN
-------------------------------------------------------------------------------
Seq Scan on subsource_position (cost=0.00..362486535.10 rows=128524 width=4)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..2566.50 rows=101500 width=4)
-> Seq Scan on subsource (cost=0.00..1662.00 rows=101500 width=4)
查询#2 :
SELECT id FROM subsource_position
EXCEPT
SELECT position_id FROM subsource;
计划:
QUERY PLAN
-------------------------------------------------------------------------------------------------
SetOp Except (cost=24760.35..25668.66 rows=95997 width=4)
-> Sort (cost=24760.35..25214.50 rows=181663 width=4)
Sort Key: "*SELECT* 1".id
-> Append (cost=0.00..6406.26 rows=181663 width=4)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..4146.94 rows=95997 width=4)
-> Seq Scan on subsource_position (cost=0.00..3186.97 rows=95997 width=4)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..2259.32 rows=85666 width=4)
-> Seq Scan on subsource (cost=0.00..1402.66 rows=85666 width=4)
(8 rows)
我感觉我丢失了一个查询中明显不好的东西,或者我错误地配置了PostgreSQL服务器。我本来希望 NOT IN
可以很好地进行优化;是不输入
始终是性能问题,还是有没有在此处进行优化的原因?
I have a feeling I'm missing either something obviously bad about one of my queries, or I have misconfigured the PostgreSQL server. I would have expected this NOT IN
to optimize well; is NOT IN
always a performance problem or is there a reason it does not optimize here?
其他数据:
=> select count(*) from subsource;
count
-------
85158
(1 row)
=> select count(*) from subsource_position;
count
-------
93261
(1 row)
编辑:我现在解决了下面提到的AB!= BA问题。但是我所说的问题仍然存在:查询1仍然比查询2严重得多。我相信,这是由于两个表的行数相似。
Edit: I have now fixed the A-B != B-A problem mentioned below. But my problem as stated still exists: query #1 is still massively worse than query #2. This, I believe, follows from the fact that both tables have similar numbers of rows.
编辑2 :我使用的是PostgresQL 9.0。 4。我无法使用EXPLAIN ANALYZE,因为查询#1花费的时间太长。所有这些列都不是NULL,因此应该没有任何区别。
Edit 2: I'm using PostgresQL 9.0.4. I cannot use EXPLAIN ANALYZE because query #1 takes too long. All of these columns are NOT NULL, so there should be no difference as a result of that.
编辑3 :我有一个索引这两个列。我尚未完成查询#1(约10分钟后放弃)。查询#2立即返回。
Edit 3: I have an index on both these columns. I haven't yet gotten query #1 to complete (gave up after ~10 minutes). Query #2 returns immediately.
推荐答案
由于您使用的是默认配置,因此请尝试增加work_mem。子查询最有可能最终被后台处理到磁盘,因为您只允许1Mb的工作内存。尝试10或20mb。
Since you are running with the default configuration, try bumping up work_mem. Most likely, the subquery ends up getting spooled to disk because you only allow for 1Mb of work memory. Try 10 or 20mb.
这篇关于PostgreSQL:NOT IN与EXCEPT的性能差异(编辑#2)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!