PostgreSQL索引不用于IP范围查询 [英] PostgreSQL index not used for query on IP ranges

查看:113
本文介绍了PostgreSQL索引不用于IP范围查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用PostgreSQL 9.2,并具有IP范围表.这是SQL:

I'm using PostgreSQL 9.2 and have a table of IP ranges. Here's the SQL:

CREATE TABLE ips (
  id serial NOT NULL,
  begin_ip_num bigint,
  end_ip_num bigint,
  country_name character varying(255),
  CONSTRAINT ips_pkey PRIMARY KEY (id )
)

我在begin_ip_numend_ip_num上都添加了普通的B树索引:

I've added plain B-tree indices on both begin_ip_num and end_ip_num:

CREATE INDEX index_ips_on_begin_ip_num ON ips (begin_ip_num);
CREATE INDEX index_ips_on_end_ip_num ON ips (end_ip_num );

正在使用的查询是:

SELECT ips.* FROM ips
WHERE 3065106743 BETWEEN begin_ip_num AND end_ip_num;

问题是我的BETWEEN查询仅使用begin_ip_num上的索引.使用索引后,它将使用end_ip_num过滤结果.这是EXPLAIN ANALYZE结果:

The problem is that my BETWEEN query is only using the index on begin_ip_num. After using the index, it filters the result using end_ip_num. Here's the EXPLAIN ANALYZE result:

Index Scan using index_ips_on_begin_ip_num on ips  (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1)
Index Cond: (3065106743::bigint >= begin_ip_num)
Filter: (3065106743::bigint <= end_ip_num)
Rows Removed by Filter: 47596
Total runtime: 16.425 ms

我已经尝试了各种索引组合,包括在begin_ip_numend_ip_num上都添加了复合索引.

I've already tried various combinations of indices including adding a composite index on both begin_ip_num and end_ip_num.

推荐答案

尝试多列索引,但第二列的顺序相反:

Try a multicolumn index, but with reversed order on the second column:

CREATE INDEX index_ips_begin_end_ip_num ON ips (begin_ip_num, end_ip_num DESC);

对于单列索引,排序几乎是无关紧要的,因为它可以几乎一样快地向后扫描.但这对于多列索引很重要.

Ordering is mostly irrelevant for a single-column index, since it can be scanned backwards almost as fast. But it is important for multicolumn indexes.

使用我建议的索引,Postgres可以扫描第一列并找到地址,其余索引满足第一个条件.然后,对于第一列的每个值,它可以返回满足第二个条件的所有行,直到第一个条件失败.然后跳转到第一列的下一个值,等等.
这仍然 效率不高 ,而Postgres可能会更快,只需扫描第一个索引列并过滤第二个索引列即可.很大程度上取决于您的数据分布.

With the index I propose, Postgres can scan the first column and find the address, where the rest of the index fulfills the first condition. Then it can, for each value of the first column, return all rows that fulfill the second condition, until the first one fails. Then jump to the next value of the first column, etc.
This is still not very efficient and Postgres may be faster just scanning the first index column and filtering for the second. Very much depends on your data distribution.

无论哪种方式, CLUSTER 都使用来自上面可以帮助提高性能:

Either way, CLUSTER using the multicolumn index from above can help performance:

CLUSTER ips USING index_ips_begin_end_ip_num

这样,将满足您的第一个条件的候选者打包到相同或相邻的数据页上.如果第一列的每个值都有很多行,则可以极大地提高性能.否则效果不佳.
(也有用于此目的的非阻塞外部工具: pg_repack pg_squeeze .)

This way, candidates fulfilling your first condition are packed onto the same or adjacent data pages. Can help performance a lot with if you have lots of rows per value of the first column. Else it is hardly effective.
(There are also non-blocking external tools for the purpose: pg_repack or pg_squeeze.)

此外,自动清理还是已正确配置或正在运行?在桌子上运行ANALYZE?您需要Postgres的当前统计信息来选择合适的查询计划.

Also, is autovacuum running and configured properly or have you run ANALYZE on the table? You need current statistics for Postgres to pick appropriate query plans.

真正有用的是 GiST索引表示int8range,自PostgreSQL 9.2起可用.

What would really help here is a GiST index for a int8range column, available since PostgreSQL 9.2.

进一步阅读:

如果,您的IP范围可以使用附加模块 ip4r (不在标准分布中.索引策略也会相应更改.

If your IP ranges can be covered with one of the built-in network types inet or cidr, consider to replace your two bigint columns. Or, better yet, look to the additional module ip4r by Andrew Gierth (not in the standard distribution. The indexing strategy changes accordingly.

除非如此,否则您可以使用带有部分索引的复杂机制,在dba.SE上查看此相关答案.进阶的东西,但它提供出色的性能:

Barring that, you can check out this related answer on dba.SE with using a sophisticated regime with partial indexes. Advanced stuff, but it delivers great performance:

这篇关于PostgreSQL索引不用于IP范围查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆