PostgreSQL不使用部分索引 [英] PostgreSQL does not use a partial index

查看:226
本文介绍了PostgreSQL不使用部分索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在PostgreSQL 9.2中有一个表,它有一个 text 列。让我们调用 text_col 。此列中的值相当独特(最多可包含5-6个重复项)。表有约5百万行。这些行的大约一半包含 text_col null 值。当我执行以下查询时,我期望1-5行。在大多数情况下(> 80%)我只预期1行。

I have a table in PostgreSQL 9.2 that has a text column. Let's call this text_col. The values in this column are fairly unique (may contain 5-6 duplicates at the most). The table has ~5 million rows. About half these rows contain a null value for text_col. When I execute the following query I expect 1-5 rows. In most cases (>80%) I only expect 1 row.

explain analyze SELECT col1,col2.. colN
FROM table 
WHERE text_col = 'my_value';

A btree 索引存在于 text_col 。这个索引从来不被查询计划器使用,我不知道为什么。这是查询的输出。

A btree index exists on text_col. This index is never used by the query planner and I am not sure why. This is the output of the query.

Seq Scan on two (cost=0.000..459573.080 rows=93 width=339) (actual time=1392.864..3196.283 rows=2 loops=1)
Filter: (victor = 'foxtrot'::text)
Rows Removed by Filter: 4077384

我添加了另一个部分索引,尝试过滤出not null,但是没有帮助(有或没有 text_pattern_ops 。我不需要 text_pattern_ops 考虑没有 LIKE 条件表示在我的查询中,但它们也匹配等号)。

I added another partial index to try to filter out those values that were not null, but that did not help (with or without text_pattern_ops. I do not need text_pattern_ops considering no LIKE conditions are expressed in my queries, but they also match equality).

CREATE INDEX name_idx
  ON table
  USING btree
  (text_col COLLATE pg_catalog."default" text_pattern_ops)
  WHERE text_col IS NOT NULL;

使用禁用序列扫描set enable_seqscan = off; 使计划员仍然通过 index_scan 选择 seqscan 。总结...

Disabling sequence scans using set enable_seqscan = off; makes the planner still pick the seqscan over an index_scan. In summary...


  1. 此查询返回的行数很小。

  2. 非空行非常独特,对文本的索引扫描应该更快。

  3. 清理和分析表格没有帮助优化器选择索引。



我的问题



My questions


  1. 为什么数据库选择序列扫描

  2. 当一个表有一个文本列,其相等条件应该被检查,有没有我可以坚持的最佳实践?

  3. 如何减少此查询所需的时间?



]




  1. 在我的本地数据库中检索到索引扫描,其中包含大约10%的生产数据。


推荐答案

partial index 是一个好主意,可以排除您显然不需要的表格的一半行。简单:

A partial index is a good idea to exclude half the rows of the table which you obviously do not need. Simpler:

CREATE INDEX name_idx ON table (text_col)
WHERE text_col IS NOT NULL;

创建后必须运行 ANALYZE表索引。 (Autovacuum会在一段时间后自动执行,如果你不手动做,但如果你在创建后立即测试,你的测试将失败。)

Be sure to run ANALYZE table after creating the index. (Autovacuum does that automatically after some time if you don't do it manually, but if you test right after creation, your test will fail.)

然后,说服查询计划器可以使用特定的部分索引,在查询中重复 WHERE 条件 - 即使它似乎完全多余:

Then, to convince the query planner that a particular partial index can be used, repeat the WHERE condition in the query - even if it seems completely redundant:

SELECT col1,col2, .. colN
FROM   table 
WHERE  text_col = 'my_value'
AND   text_col IS NOT NULL;  -- repeat condition

Voilá。

根据文档


但是,请记住,谓词必须匹配条件
用于应该从索引中获益的查询。要成为
精确,只有当系统可以
识别出查询的 WHERE 条件在数学上隐含$时,才可以在查询中使用部分索引b $ b索引的谓词。 PostgreSQL没有一个复杂的
定理证明器,它可以识别以不同形式编写的数学等价的
表达式。 (不仅是这样的
一般定理证明器极难创建,它将
可能太慢以至于没有任何实际使用。)系统可以识别
简单的不等式影响,例如x <1表示x <2;
否则谓词条件必须与
查询的 WHERE 条件完全匹配,否则索引不会被识别为可用。
匹配发生在查询计划时间,而不是运行时。作为
结果,参数化查询子句不能使用部分索引。

However, keep in mind that the predicate must match the conditions used in the queries that are supposed to benefit from the index. To be precise, a partial index can be used in a query only if the system can recognize that the WHERE condition of the query mathematically implies the predicate of the index. PostgreSQL does not have a sophisticated theorem prover that can recognize mathematically equivalent expressions that are written in different forms. (Not only is such a general theorem prover extremely difficult to create, it would probably be too slow to be of any real use.) The system can recognize simple inequality implications, for example "x < 1" implies "x < 2"; otherwise the predicate condition must exactly match part of the query's WHERE condition or the index will not be recognized as usable. Matching takes place at query planning time, not at run time. As a result, parameterized query clauses do not work with a partial index.

部分索引的(冗余)谓词作为附加的常量 WHERE 条件,并且它工作正常。

As for parameterized queries: again, add the (redundant) predicate of the partial index as an additional, constant WHERE condition, and it works just fine.

Postgres 9.6 中的重要更新大大提高了 仅索引扫描 (这可以使查询更便宜,查询计划程序将更容易选择此类查询计划)。相关:

An important update in Postgres 9.6 largely improves chances for index-only scans (which can make queries cheaper and the query planner will more readily chose such query plans). Related:

  • PostgreSQL not using index during count(*)

这篇关于PostgreSQL不使用部分索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆