未使用Postgres的gin_trgm_ops索引 [英] Postgres `gin_trgm_ops` index not being used

查看:1003
本文介绍了未使用Postgres的gin_trgm_ops索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试加速 Postgres中的某些文本匹配,使用 pg_trgm 扩展名:

I'm trying to speed up some text matching in Postgres, using the pg_trgm extensions:

CREATE TABLE test3 (id bigint, key text, value text);

insert into test3 values (1, 'first 1', 'second 3');
insert into test3 values (2, 'first 1', 'second 2');
insert into test3 values (2, 'first 2', 'second 3');
insert into test3 values (3, 'first 1', 'second 2');
insert into test3 values (3, 'first 1', 'second 3');
insert into test3 values (4, 'first 2', 'second 3');
insert into test3 values (4, 'first 2', 'second 3');
insert into test3 values (4, 'first 1', 'second 2');
insert into test3 values (4, 'first 1', 'second 2');

-- repeat the above 1,000,000x times, to have more rows for benchmarking
insert into test3(id, key, value) select id, key, value from test3 cross join generate_series(1, 1000000);

现在我用 ILIKE 查询此表:

select count(*) from test3 where key = 'first 1' and value ilike '%nd 3%';
Time: 918.265 ms

要查看索引编制是否可以加快速度,我添加了<$ value 列上的c $ c> pg_trgm :

To see if indexing would speed this up, I added pg_trgm on both key and value columns:

CREATE extension if not exists pg_trgm;
CREATE INDEX test3_key_trgm_idx ON test3 USING gin (key gin_trgm_ops);
CREATE INDEX test3_value_trgm_idx ON test3 USING gin (value gin_trgm_ops);

但是查询仍然需要相同的时间,并且 EXPLAIN ANALYZE 显示完全没有使用索引:

But the query still takes the same time, and EXPLAIN ANALYZE shows the indexes are not being used at all:

explain analyze select count(*) from test3 where key = 'first 1' and value ilike '%nd 3%';
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=126905.14..126905.15 rows=1 width=8) (actual time=1017.666..1017.667 rows=1 loops=1)
   ->  Gather  (cost=126904.93..126905.14 rows=2 width=8) (actual time=1017.505..1018.778 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=125904.93..125904.94 rows=1 width=8) (actual time=1010.862..1010.862 rows=1 loops=3)
               ->  Parallel Seq Scan on test3  (cost=0.00..122427.06 rows=1391148 width=0) (actual time=0.041..973.550 rows=666667 loops=3)
                     Filter: ((value ~~* '%nd 3%'::text) AND (key = 'first 1'::text))
                     Rows Removed by Filter: 2333336
 Planning Time: 0.266 ms
 Execution Time: 1018.814 ms

Time: 1049.413 ms (00:01.049)

请注意顺序扫描。

推荐答案

没关系,我发现了问题。

Never mind, I found the issue.

查询计划者比我的玩具测试集更聪明;

The query planner was smarter than my toy test set; seeing as most rows match the query, it went for a sequential scan.

如果我尝试使用 ilike'%nd 0%',没有行匹配,并且EXPLAIN ANALYZE报告正确地对test3_value_trgm_idx 进行位图索引扫描。

If I try with ilike '%nd 0%' instead, no rows match and EXPLAIN ANALYZE reports Bitmap Index Scan on test3_value_trgm_idx correctly.

因此,以这种方式规范化原始JSONB是可行的。但是,我还将尝试查找和比较另一种方法,即使用 TEXT 上的正则表达式,以避免不得不创建和维护另一个表。

So, normalizing the original JSONB in this manner works. But I'll also try to find and compare another way, using regular expressions over TEXT, to avoid having to create and maintain another table.

这篇关于未使用Postgres的gin_trgm_ops索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆