用IN的postgres查询非常慢 [英] postgres query with IN is very slow

查看:533
本文介绍了用IN的postgres查询非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表,该表的索引为(第A列,B列)。而且我正在运行一个查询,如下所示:

I have a table which has an index on (column A, column B). And I'm running a query that looks like this:

SELECT * FROM table WHERE (A, B) IN ((a_1, b_1), (a_2, b_2), ..., (a_5000, b_5000))

此查询很慢!
该计划如下所示:

This query is very slow ! The plan looks like:

Bitmap Heap Scan on table
  Recheck Cond: (((A = a_1) AND (B = b_1)) OR ((A = a_2) AND (B = b_2)) OR ...
  ->  BitmapOr
        ->  Bitmap Index Scan on idx
              Index Cond: ((A = a_1) AND (B = b_1))
        ->  Bitmap Index Scan on idx
              Index Cond: ((A = a_2) AND (B = b_2))
        ...(5000 other Bitmax Index Scan)

而不是用5000进行一次索引扫描值,postgres似乎一次执行5000个索引扫描,一次解释一个值,这解释了为什么查询如此缓慢。

Instead of doing one index scan with 5000 values, postgres seems to be doing 5000 index scan with one value at a time, which explains why the query is so slow.

实际上,这样做的速度更快:

Actually it is way faster to do someting like:

SELECT * FROM table WHERE A IN (a_1, ..., a_5000)

获取结果,然后在应用程序内部的B列(Python)上进行过滤。

fetch the results and then filter on column B inside the app (python).

我真的更希望结果已经由Postgres在合理的运行时间下进行过滤。是一种解决方法?

I'd really prefer to have the results already filtered by postgres with a reasonable running time. Is there a workaround ?

推荐答案

尝试加入CTE:

with value_list (a,b) as (
  values 
      (a_1, b_1), 
      (a_2, b_2), ..., 
      (a_5000, b_5000) 
)
select *
from table t
  join value_list v on (t.a, t.b) = (v.a, v.b);

(假定值列表中没有重复项)

(This assumes you have no duplicates in the list of values)

这篇关于用IN的postgres查询非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆