Postgres结合了多个索引 [英] Postgres combining multiple Indexes

查看:75
本文介绍了Postgres结合了多个索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下表/索引-

CREATE TABLE test
(
   coords geography(Point,4326), 
   user_id varchar(50), 
   created_at timestamp
);
CREATE INDEX ix_coords ON test USING GIST (coords);
CREATE INDEX ix_user_id ON test (user_id);
CREATE INDEX ix_created_at ON test (created_at DESC);

这是我要执行的查询:

select * 
from updates 
where ST_DWithin(coords, ST_MakePoint(-126.4, 45.32)::geography, 30000) 
and user_id='3212312' 
order by created_at desc
limit 60

当我运行查询时,它仅使用ix_coords索引.如何确保Postgres在查询中也使用ix_user_idix_created_at索引?

When I run the query it only uses ix_coords index. How can I ensure that Postgres uses ix_user_id and ix_created_at index as well for the query?

这是一张新表,我在其中批量插入了生产数据. test表中的总行: 15,069,489

This is a new table in which I did bulk insert of production data. Total rows in the test table: 15,069,489

我正在使用(effective_cache_size = 2GB)运行PostgreSQL 9.2.1(带有Postgis).这是我的本地OSX,带有16GB RAM,Core i7/2.5 GHz,非SSD磁盘.

I am running PostgreSQL 9.2.1 (with Postgis) with (effective_cache_size = 2GB). This is my local OSX with 16GB RAM, Core i7/2.5 GHz, non-SSD disk.

添加EXPLAIN ANALYZE输出-

Limit  (cost=71.64..71.65 rows=1 width=280) (actual time=1278.652..1278.665 rows=60 loops=1)
   ->  Sort  (cost=71.64..71.65 rows=1 width=280) (actual time=1278.651..1278.662 rows=60 loops=1)
         Sort Key: created_at
         Sort Method: top-N heapsort  Memory: 33kB
         ->  Index Scan using ix_coords on test  (cost=0.00..71.63 rows=1 width=280) (actual time=0.198..1278.227 rows=178 loops=1)
               Index Cond: (coords && '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography)
               Filter: (((user_id)::text = '4f1092000b921a000100015c'::text) AND ('0101000020E61000006666666666E63C40C3F5285C8F824440'::geography && _st_expand(coords, 30000::double precision)) AND _st_dwithin(coords, '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography, 30000::double precision, true))
               Rows Removed by Filter: 3122459
 Total runtime: 1278.701 ms

更新:

根据下面的建议,我尝试在软线+ user_id上建立索引:

Based on the suggestions below I tried index on cords + user_id:

CREATE INDEX ix_coords_and_user_id ON updates USING GIST (coords, user_id);

..但是出现以下错误:

..but get the following error:

ERROR:  data type character varying has no default operator class for access method "gist"
HINT:  You must specify an operator class for the index or define a default operator class for the data type.

更新:

因此CREATE EXTENSION btree_gist;解决了btree/gist复合索引问题.现在我的索引看起来像

So the CREATE EXTENSION btree_gist; solved the btree/gist compound index issue. And now my index looks like

CREATE INDEX ix_coords_user_id_created_at ON test USING GIST (coords, user_id, created_at);

注意:btree_gist不接受DESC/ASC.

NOTE: btree_gist does not accept DESC/ASC.

新的查询计划:

Limit  (cost=134.99..135.00 rows=1 width=280) (actual time=273.282..273.292 rows=60 loops=1)
   ->  Sort  (cost=134.99..135.00 rows=1 width=280) (actual time=273.281..273.285 rows=60 loops=1)
         Sort Key: created_at
         Sort Method: quicksort  Memory: 41kB
         ->  Index Scan using ix_updates_coords_user_id_created_at on updates  (cost=0.00..134.98 rows=1 width=280) (actual time=0.406..273.110 rows=115 loops=1)
               Index Cond: ((coords && '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography) AND ((user_id)::text = '4e952bb5b9a77200010019ad'::text))
               Filter: (('0101000020E61000006666666666E63C40C3F5285C8F824440'::geography && _st_expand(coords, 30000::double precision)) AND _st_dwithin(coords, '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography, 30000::double precision, true))
               Rows Removed by Filter: 1
 Total runtime: 273.331 ms

查询的性能比以前更好,虽然提高了将近一秒,但仍然不是很好.我想这是我能得到的最好的?我希望在60-80ms左右的某个地方.同样从查询中取出order by created_at desc,也可以节省100毫秒,这意味着它无法使用索引.反正要解决这个问题吗?

The query is performing better than before, almost a second better but still not great. I guess this is the best that I can get?? I was hoping somewhere around 60-80ms. Also taking order by created_at desc from the query, shaves off another 100ms, meaning it is unable to use the index. Anyway to fix this?

推荐答案

我不知道Pg是否可以将GiST索引和常规b树索引与位图索引扫描结合在一起,但我怀疑不能.您可能会获得最佳结果,而无需在GiST索引中添加user_id列(因此,对于不使用user_id的其他查询,它会变得更大,更慢).

I don't know if Pg can combine a GiST index and regular b-tree indexes with a bitmap index scan, but I suspect not. You may be getting the best result you can without adding a user_id column to your GiST index (and consequently making it bigger and slower for other queries that don't use user_id).

作为实验,您可以:

CREATE EXTENSION btree_gist;
CREATE INDEX ix_coords_and_user_id ON test USING GIST (coords, user_id);

这可能会导致较大的索引,但可能会提高该查询的效果(如果可行).请注意,保持这样的索引会大大降低INSERTUPDATE的速度.如果您删除旧的ix_coords,即使它们未在user_id上进行过滤,您的查询也将使用ix_coords_and_user_id,但它会比ix_coords慢.同时使用这两个选项会使INSERTUPDATE的速度变慢.

which is likely to result in a big index, but might boost that query - if it works. Be aware that maintaining such an index will significantly slow INSERT and UPDATEs. If you drop the old ix_coords your queries will use ix_coords_and_user_id even if they don't filter on user_id, but it'll be slower than ix_coords. Keeping both will make the INSERT and UPDATE slowdown even worse.

请参见 btree-gist

(被问题所取代,从而完全改变了问题;撰写时,用户具有多列索引,现在已分为两个单独的索引):

您似乎不是在user_id上进行过滤或排序,而只是在create_date上进行过滤或排序. Pg不会(不能吗?)仅使用像(user_id, create_date)这样的多列索引的第二项,它也需要使用第一项.

You don't seem to be filtering or sorting on user_id, only create_date. Pg won't (can't?) use only the second term of a multi-column index like (user_id, create_date), it needs use of the first item too.

如果要索引create_date,请为其创建一个单独的索引.如果您使用并需要(user_id, create_date)索引,并且通常不单独使用user_id,请查看是否可以反转列顺序.交替创建两个独立的索引(user_id)(create_date).当需要两列时,Pg可以使用位图索引扫描将两个独立索引合并.

If you want to index create_date, create a separate index for it. If you use and need the (user_id, create_date) index and don't generally use just user_id alone, see if you can reverse the column order. Alternately create two independent indexes, (user_id) and (create_date). When both columns are needed Pg can combine the two indepependent indexes using a bitmap index scan.

这篇关于Postgres结合了多个索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆