Postgres不在性能中 [英] Postgres NOT IN performance

查看:77
本文介绍了Postgres不在性能中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么想法可以加快查询速度吗?

Any ideas how to speed up this query?

输入

EXPLAIN SELECT entityid FROM entity e

LEFT JOIN level1entity l1 ON l1.level1id = e.level1_level1id
LEFT JOIN level2entity l2 ON l2.level2id = l1.level2_level2id
WHERE 

l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f' 
AND 
(entityid NOT IN 
(1377776,1377792,1377793,1377794,1377795,1377796... 50000 ids)
)

输出

Nested Loop  (cost=0.00..1452373.79 rows=3865 width=8)
  ->  Nested Loop  (cost=0.00..8.58 rows=1 width=8)
        Join Filter: (l1.level2_level2id = l2.level2id)
        ->  Seq Scan on level2entity l2  (cost=0.00..3.17 rows=1 width=8)
              Filter: ((userid)::text = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f'::text)
        ->  Seq Scan on level1entity l1  (cost=0.00..4.07 rows=107 width=16)
  ->  Index Scan using fk_fk18edb1cfb2a41235_idx on entity e  (cost=0.00..1452086.09 rows=22329 width=16)
        Index Cond: (level1_level1id = l1.level1id)

好的,这里是简化版本,联接不是瓶颈

OK here a simplified version, the joins aren't the bottleneck

SELECT enitityid FROM 
(SELECT enitityid FROM enitity e LIMIT 5000) a

WHERE
(enitityid NOT IN 
(1377776,1377792,1377793,1377794,1377795, ... 50000 ids)
)

问题是要找到没有t具有这些ID中的任何一个

the problem is to find the enties which don't have any of these ids

EXPLAIN

Subquery Scan on a  (cost=0.00..312667.76 rows=1 width=8)
  Filter: (e.entityid <> ALL ('{1377776,1377792,1377793,1377794, ... 50000 ids}'::bigint[]))
  ->  Limit  (cost=0.00..111.51 rows=5000 width=8)
        ->  Seq Scan on entity e  (cost=0.00..29015.26 rows=1301026 width=8)


推荐答案

巨大的 IN 列表效率很低。 PostgreSQL应该理想地识别它并将其转变为一个进行反联接的关系,但是此时查询计划者不知道该怎么做,而识别这种情况所需的计划时间将花费每个查询所需的时间。明智地使用 NOT IN ,因此它必须是非常低成本的支票。参见此更早的主题详细答案

A huge IN list is very inefficient. PostgreSQL should ideally identify it and turn it into a relation that it does an anti-join on, but at this point the query planner doesn't know how to do that, and the planning time required to identify this case would cost every query that uses NOT IN sensibly, so it'd have to be a very low cost check. See this earlier much more detailed answer on the topic.

正如大卫·奥尔德里奇(David Aldridge)所写,这可以通过将其变为反联接来最好地解决。我将其写为 VALUES 列表上的联接,仅仅是因为PostgreSQL极快地将 VALUES 列表解析为关系,但效果是相同的:

As David Aldridge wrote this is best solved by turning it into an anti-join. I'd write it as a join over a VALUES list simply because PostgreSQL is extremely fast at parsing VALUES lists into relations, but the effect is the same:

SELECT entityid 
FROM entity e
LEFT JOIN level1entity l1 ON l.level1id = e.level1_level1id
LEFT JOIN level2entity l2 ON l2.level2id = l1.level2_level2id
LEFT OUTER JOIN (
    VALUES
    (1377776),(1377792),(1377793),(1377794),(1377795),(1377796)
) ex(ex_entityid) ON (entityid = ex_entityid)
WHERE l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f' 
AND ex_entityid IS NULL; 

对于足够大的一组值,最好创建一个临时表 COPY 将值放入其中,在其上创建 PRIMARY KEY ,然后加入。

For a sufficiently large set of values you might even be better off creating a temporary table, COPYing the values into it, creating a PRIMARY KEY on it, and joining on that.

在这里探索更多可能性:

More possibilities explored here:

https:/ /stackoverflow.com/a/17038097/398670

这篇关于Postgres不在性能中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆