PostgreSQL 不使用部分索引 [英] PostgreSQL does not use a partial index

查看:38
本文介绍了PostgreSQL 不使用部分索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 PostgreSQL 9.2 中有一个表,它有一个 text 列.我们称之为text_col.此列中的值相当独特(最多可能包含 5-6 个重复项).该表有约 500 万行.这些行大约有一半包含 text_colnull 值.当我执行以下查询时,我希望有 1-5 行.在大多数情况下(> 80%)我只希望有 1 行.

I have a table in PostgreSQL 9.2 that has a text column. Let's call this text_col. The values in this column are fairly unique (may contain 5-6 duplicates at the most). The table has ~5 million rows. About half these rows contain a null value for text_col. When I execute the following query I expect 1-5 rows. In most cases (>80%) I only expect 1 row.

explain analyze SELECT col1,col2.. colN
FROM table 
WHERE text_col = 'my_value';

text_col 上存在 btree 索引.查询规划器从不使用此索引,我不知道为什么.这是查询的输出.

A btree index exists on text_col. This index is never used by the query planner and I am not sure why. This is the output of the query.

Seq Scan on two (cost=0.000..459573.080 rows=93 width=339) (actual time=1392.864..3196.283 rows=2 loops=1)
Filter: (victor = 'foxtrot'::text)
Rows Removed by Filter: 4077384

我添加了另一个部分索引来尝试过滤掉那些不为空的值,但这没有帮助(有或没有 text_pattern_ops.我不需要 text_pattern_ops考虑到我的查询中没有表达 LIKE 条件,但它们也匹配相等性).

I added another partial index to try to filter out those values that were not null, but that did not help (with or without text_pattern_ops. I do not need text_pattern_ops considering no LIKE conditions are expressed in my queries, but they also match equality).

CREATE INDEX name_idx
  ON table
  USING btree
  (text_col COLLATE pg_catalog."default" text_pattern_ops)
  WHERE text_col IS NOT NULL;

使用 set enable_seqscan = off; 禁用序列扫描使规划器仍然选择 seqscan 而不是 index_scan.总之……

Disabling sequence scans using set enable_seqscan = off; makes the planner still pick the seqscan over an index_scan. In summary...

  1. 此查询返回的行数很少.
  2. 鉴于非空行相当独特,对文本的索引扫描应该更快.
  3. 清理和分析表并没有帮助优化器选择索引.

我的问题

  1. 为什么数据库选择序列扫描而不是索引扫描?
  2. 如果表格有一个文本列,应检查其相等条件,我可以遵循哪些最佳实践?
  3. 如何减少此查询所用的时间?

  1. 索引扫描是在我的本地数据库上进行的,该数据库包含生产中可用数据的大约 10%.

推荐答案

A 部分索引 是一个好主意 排除表中您显然不需要的一半行.更简单:

A partial index is a good idea to exclude half the rows of the table which you obviously do not need. Simpler:

CREATE INDEX name_idx ON table (text_col)
WHERE text_col IS NOT NULL;

确保在创建索引后运行ANALYZE table.(如果您不手动执行,Autovacuum 会在一段时间后自动执行此操作,但如果您在创建后立即进行测试,您的测试将失败.)

Be sure to run ANALYZE table after creating the index. (Autovacuum does that automatically after some time if you don't do it manually, but if you test right after creation, your test will fail.)

然后,为了让查询规划器相信可以使用特定的部分索引,请在查询中重复 WHERE 条件 - 即使它看起来完全多余:

Then, to convince the query planner that a particular partial index can be used, repeat the WHERE condition in the query - even if it seems completely redundant:

SELECT col1,col2, .. colN
FROM   table 
WHERE  text_col = 'my_value'
AND   text_col IS NOT NULL;  -- repeat condition

瞧.

每个文档:

但是,请记住谓词必须匹配条件在应该从索引中受益的查询中使用.成为精确,只有当系统可以时,才能在查询中使用部分索引认识到查询的 WHERE 条件在数学上暗示索引的谓词.PostgreSQL 没有复杂的可以识别数学等价的定理证明器以不同形式书写的表达式.(不仅是这样的一般定理证明器极难创建,它会可能太慢而没有任何实际用途.)系统可以识别简单的不等式含义,例如x <1"意味着x <2";否则谓词条件必须完全匹配查询的 WHERE 条件或索引将不被识别为可用.匹配发生在查询计划时,而不是运行时.作为一个结果,参数化查询子句不适用于部分索引.

However, keep in mind that the predicate must match the conditions used in the queries that are supposed to benefit from the index. To be precise, a partial index can be used in a query only if the system can recognize that the WHERE condition of the query mathematically implies the predicate of the index. PostgreSQL does not have a sophisticated theorem prover that can recognize mathematically equivalent expressions that are written in different forms. (Not only is such a general theorem prover extremely difficult to create, it would probably be too slow to be of any real use.) The system can recognize simple inequality implications, for example "x < 1" implies "x < 2"; otherwise the predicate condition must exactly match part of the query's WHERE condition or the index will not be recognized as usable. Matching takes place at query planning time, not at run time. As a result, parameterized query clauses do not work with a partial index.

对于参数化查询:同样,添加部分索引的(冗余)谓词作为附加的、恒定的 WHERE 条件,它工作得很好.

As for parameterized queries: again, add the (redundant) predicate of the partial index as an additional, constant WHERE condition, and it works just fine.

Postgres 9.6 中的一项重要更新大大提高了 仅索引扫描(这可以使查询更便宜,查询计划器将更容易选择此类查询计划).相关:

An important update in Postgres 9.6 largely improves chances for index-only scans (which can make queries cheaper and the query planner will more readily chose such query plans). Related:

这篇关于PostgreSQL 不使用部分索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆