PostgreSQL索引不用于范围查询 [英] PostgreSQL index not used for query on range

查看:541
本文介绍了PostgreSQL索引不用于范围查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用PostgreSQL(9.2.0)并且有一个IP范围表。这是SQL:

I'm using PostgreSQL (9.2.0) and have a table of IP ranges. Here's the SQL:

CREATE TABLE ips
(
  id serial NOT NULL,
  begin_ip_num bigint,
  end_ip_num bigint,
  country_name character varying(255),
  CONSTRAINT ips_pkey PRIMARY KEY (id )
)

我已经在 begin_ip_num end_ip_num

CREATE INDEX index_ips_on_begin_ip_num
  ON ips
  USING btree
  (begin_ip_num );

CREATE INDEX index_ips_on_end_ip_num
  ON ips
  USING btree
  (end_ip_num );

正在使用的查询是:

SELECT "ips".* FROM "ips" WHERE (3065106743 BETWEEN begin_ip_num AND end_ip_num);

问题是我的 BETWEEN 查询是只使用 begin_ip_num 上的索引。使用索引后,它将使用 end_ip_num 过滤结果。这里是 EXPLAIN ANALYZE 结果:

The problem is that my BETWEEN query is only using the index on begin_ip_num. After using the index, it filters the result using end_ip_num. Here's the EXPLAIN ANALYZE result:

Index Scan using index_ips_on_begin_ip_num on ips  (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1)
Index Cond: (3065106743::bigint >= begin_ip_num)
Filter: (3065106743::bigint <= end_ip_num)
Rows Removed by Filter: 47596
Total runtime: 16.425 ms

我已经尝试过各种索引组合,包括在 begin_ip_num end_ip_num

I've already tried various combinations of indices including adding a composite index on both begin_ip_num and end_ip_num.

推荐答案

尝试一个多列索引,但在第二列上的顺序相反:

Try a multicolumn index, but with reversed order on the second column:

CREATE INDEX index_ips_begin_end_ip_num ON ips (begin_ip_num, end_ip_num DESC);

订单与单列索引大部分无关,因为可以快速向后扫描。但是对于多列索引来说很重要。

Ordering is mostly irrelevant for a single-column index, since it can be scanned backwards almost as fast. But it is important for multicolumn indexes.

使用索引我建议,Postgres可以扫描第一列并找到地址,其余的索引满足第一个条件。然后,对于第一列的每个值,可以返回满足第二个条件的所有行,直到第一个失败。然后跳转到第一列的下一个值等。

这是 仍然不是很有效 ,Postgres可能会更快,只是扫描第一个索引列和过滤第二。非常依赖于您的数据分发。

With the index I propose, Postgres can scan the first column and find the address, where the rest of the index fulfills the first condition. Then it can, for each value of the first column, return all rows that fulfill the second condition, until the first one fails. Then jump to the next value of the first column, etc.
This is still not very effective and Postgres may be faster just scanning the first index column and filtering for the second. Very much depends on your data distribution.

这里真正有用的是一个 int8range 的/rangetypes.html#RANGETYPES-GIST\">GiST索引,PostgreSQL 9.2可用。

What would really help here is a GiST index for a int8range column, available since PostgreSQL 9.2.

除此之外,您可以查看这个在dba.SE上的密切相关的回答< a>具有相当复杂的具有部分指标的制度。高级的东西,但它提供了很好的表现。

Barring that, you can check out this closely related answer on dba.SE with a rather sophisticated regime with partial indexes. Advanced stuff, but it delivers great performance.

无论哪种方式, CLUSTER 使用上面的多列索引可以帮助表现:

Either way, CLUSTER using the multicolumn index from above can help performance:

CLUSTER ips USING index_ips_begin_end_ip_num

满足您的第一个条件的候选人被打包到相同或相邻的数据页面上。如果您的第一列的每个值都有很多行,那么可以帮助您进行更多的性能调整。否则这是无效的。

This way, candidates fulfilling your first condition are packed onto the same or adjacent data pages. Can help performance a lot with if you have lots of rows per value of the first column. Else it is not effective.

另外,是 autovaccuum 运行或者您是否在表上运行 ANALYZE ?您需要Postgres的当前统计信息来选择适当的查询计划。

Also, is autovaccuum running or have you run ANALYZE on the table? You need current statistics for Postgres to pick appropriate query plans.

这篇关于PostgreSQL索引不用于范围查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆