未使用Postgres的gin_trgm_ops索引 [英] Postgres `gin_trgm_ops` index not being used
问题描述
我正在尝试加速 Postgres中的某些文本匹配,使用 pg_trgm
扩展名:
I'm trying to speed up some text matching in Postgres, using the pg_trgm
extensions:
CREATE TABLE test3 (id bigint, key text, value text);
insert into test3 values (1, 'first 1', 'second 3');
insert into test3 values (2, 'first 1', 'second 2');
insert into test3 values (2, 'first 2', 'second 3');
insert into test3 values (3, 'first 1', 'second 2');
insert into test3 values (3, 'first 1', 'second 3');
insert into test3 values (4, 'first 2', 'second 3');
insert into test3 values (4, 'first 2', 'second 3');
insert into test3 values (4, 'first 1', 'second 2');
insert into test3 values (4, 'first 1', 'second 2');
-- repeat the above 1,000,000x times, to have more rows for benchmarking
insert into test3(id, key, value) select id, key, value from test3 cross join generate_series(1, 1000000);
现在我用 ILIKE
查询此表:
select count(*) from test3 where key = 'first 1' and value ilike '%nd 3%';
Time: 918.265 ms
要查看索引编制是否可以加快速度,我添加了<$ 键
和 value
列上的c $ c> pg_trgm :
To see if indexing would speed this up, I added pg_trgm
on both key
and value
columns:
CREATE extension if not exists pg_trgm;
CREATE INDEX test3_key_trgm_idx ON test3 USING gin (key gin_trgm_ops);
CREATE INDEX test3_value_trgm_idx ON test3 USING gin (value gin_trgm_ops);
但是查询仍然需要相同的时间,并且 EXPLAIN ANALYZE
显示完全没有使用索引:
But the query still takes the same time, and EXPLAIN ANALYZE
shows the indexes are not being used at all:
explain analyze select count(*) from test3 where key = 'first 1' and value ilike '%nd 3%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=126905.14..126905.15 rows=1 width=8) (actual time=1017.666..1017.667 rows=1 loops=1)
-> Gather (cost=126904.93..126905.14 rows=2 width=8) (actual time=1017.505..1018.778 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=125904.93..125904.94 rows=1 width=8) (actual time=1010.862..1010.862 rows=1 loops=3)
-> Parallel Seq Scan on test3 (cost=0.00..122427.06 rows=1391148 width=0) (actual time=0.041..973.550 rows=666667 loops=3)
Filter: ((value ~~* '%nd 3%'::text) AND (key = 'first 1'::text))
Rows Removed by Filter: 2333336
Planning Time: 0.266 ms
Execution Time: 1018.814 ms
Time: 1049.413 ms (00:01.049)
请注意顺序扫描。
推荐答案
没关系,我发现了问题。
Never mind, I found the issue.
查询计划者比我的玩具测试集更聪明;
The query planner was smarter than my toy test set; seeing as most rows match the query, it went for a sequential scan.
如果我尝试使用 ilike'%nd 0%'
,没有行匹配,并且EXPLAIN ANALYZE报告正确地对test3_value_trgm_idx
进行位图索引扫描。
If I try with ilike '%nd 0%'
instead, no rows match and EXPLAIN ANALYZE reports Bitmap Index Scan on test3_value_trgm_idx
correctly.
因此,以这种方式规范化原始JSONB是可行的。但是,我还将尝试查找和比较另一种方法,即使用 TEXT
上的正则表达式,以避免不得不创建和维护另一个表。
So, normalizing the original JSONB in this manner works. But I'll also try to find and compare another way, using regular expressions over TEXT
, to avoid having to create and maintain another table.
这篇关于未使用Postgres的gin_trgm_ops索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!