PostgreSQL不使用部分索引 [英] PostgreSQL does not use a partial index
问题描述
我在PostgreSQL 9.2中有一个表,它有一个 text
列。让我们调用 text_col
。此列中的值相当独特(最多可包含5-6个重复项)。表有约5百万行。这些行的大约一半包含 text_col
的 null
值。当我执行以下查询时,我期望1-5行。在大多数情况下(> 80%)我只预期1行。
I have a table in PostgreSQL 9.2 that has a text
column. Let's call this text_col
. The values in this column are fairly unique (may contain 5-6 duplicates at the most). The table has ~5 million rows. About half these rows contain a null
value for text_col
. When I execute the following query I expect 1-5 rows. In most cases (>80%) I only expect 1 row.
explain analyze SELECT col1,col2.. colN
FROM table
WHERE text_col = 'my_value';
A btree
索引存在于 text_col
。这个索引从来不被查询计划器使用,我不知道为什么。这是查询的输出。
A btree
index exists on text_col
. This index is never used by the query planner and I am not sure why. This is the output of the query.
Seq Scan on two (cost=0.000..459573.080 rows=93 width=339) (actual time=1392.864..3196.283 rows=2 loops=1)
Filter: (victor = 'foxtrot'::text)
Rows Removed by Filter: 4077384
我添加了另一个部分索引,尝试过滤出not null,但是没有帮助(有或没有 text_pattern_ops
。我不需要 text_pattern_ops
考虑没有 LIKE
条件表示在我的查询中,但它们也匹配等号)。
I added another partial index to try to filter out those values that were not null, but that did not help (with or without text_pattern_ops
. I do not need text_pattern_ops
considering no LIKE
conditions are expressed in my queries, but they also match equality).
CREATE INDEX name_idx
ON table
USING btree
(text_col COLLATE pg_catalog."default" text_pattern_ops)
WHERE text_col IS NOT NULL;
使用禁用序列扫描set enable_seqscan = off;
使计划员仍然通过 index_scan
选择 seqscan
。总结...
Disabling sequence scans using set enable_seqscan = off;
makes the planner still pick the seqscan
over an index_scan
. In summary...
- 此查询返回的行数很小。
- 非空行非常独特,对文本的索引扫描应该更快。
- 清理和分析表格没有帮助优化器选择索引。
我的问题
My questions
- 为什么数据库选择序列扫描
- 当一个表有一个文本列,其相等条件应该被检查,有没有我可以坚持的最佳实践?
- 如何减少此查询所需的时间?
]
- 在我的本地数据库中检索到索引扫描,其中包含大约10%的生产数据。
推荐答案
partial index 是一个好主意,可以排除您显然不需要的表格的一半行。简单:
A partial index is a good idea to exclude half the rows of the table which you obviously do not need. Simpler:
CREATE INDEX name_idx ON table (text_col)
WHERE text_col IS NOT NULL;
创建后必须运行 ANALYZE表
索引。 (Autovacuum会在一段时间后自动执行,如果你不手动做,但如果你在创建后立即测试,你的测试将失败。)
Be sure to run ANALYZE table
after creating the index. (Autovacuum does that automatically after some time if you don't do it manually, but if you test right after creation, your test will fail.)
然后,说服查询计划器可以使用特定的部分索引,在查询中重复 WHERE
条件 - 即使它似乎完全多余:
Then, to convince the query planner that a particular partial index can be used, repeat the WHERE
condition in the query - even if it seems completely redundant:
SELECT col1,col2, .. colN
FROM table
WHERE text_col = 'my_value'
AND text_col IS NOT NULL; -- repeat condition
Voilá。
根据文档:
但是,请记住,谓词必须匹配条件
用于应该从索引中获益的查询。要成为
精确,只有当系统可以
识别出查询的WHERE
条件在数学上隐含$时,才可以在查询中使用部分索引b $ b索引的谓词。 PostgreSQL没有一个复杂的
定理证明器,它可以识别以不同形式编写的数学等价的
表达式。 (不仅是这样的
一般定理证明器极难创建,它将
可能太慢以至于没有任何实际使用。)系统可以识别
简单的不等式影响,例如x <1表示x <2;
否则谓词条件必须与
查询的WHERE
条件完全匹配,否则索引不会被识别为可用。
匹配发生在查询计划时间,而不是运行时。作为
结果,参数化查询子句不能使用部分索引。
However, keep in mind that the predicate must match the conditions used in the queries that are supposed to benefit from the index. To be precise, a partial index can be used in a query only if the system can recognize that the
WHERE
condition of the query mathematically implies the predicate of the index. PostgreSQL does not have a sophisticated theorem prover that can recognize mathematically equivalent expressions that are written in different forms. (Not only is such a general theorem prover extremely difficult to create, it would probably be too slow to be of any real use.) The system can recognize simple inequality implications, for example "x < 1" implies "x < 2"; otherwise the predicate condition must exactly match part of the query'sWHERE
condition or the index will not be recognized as usable. Matching takes place at query planning time, not at run time. As a result, parameterized query clauses do not work with a partial index.
部分索引的(冗余)谓词作为附加的常量 WHERE
条件,并且它工作正常。
As for parameterized queries: again, add the (redundant) predicate of the partial index as an additional, constant WHERE
condition, and it works just fine.
Postgres 9.6 中的重要更新大大提高了 仅索引扫描 (这可以使查询更便宜,查询计划程序将更容易选择此类查询计划)。相关:
An important update in Postgres 9.6 largely improves chances for index-only scans (which can make queries cheaper and the query planner will more readily chose such query plans). Related:
- PostgreSQL not using index during count(*)
这篇关于PostgreSQL不使用部分索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!