postgres不使用索引 [英] postgres not using index

查看:96
本文介绍了postgres不使用索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于此主题的问题很多,但所有这些问题似乎都比我目前正在讨论的情况更为复杂,答案似乎并不适用.

There are lots of questions on this topic, but all of them seem to be more complex cases than what I'm looking at at the moment and the answers don't seem applicable.

OHDSI=> \d record_counts
               Table "results2.record_counts"
         Column         |         Type          | Modifiers
------------------------+-----------------------+-----------
 concept_id             | integer               |
 schema                 | text                  |
 table_name             | text                  |
 column_name            | text                  |
 column_type            | text                  |
 descendant_concept_ids | bigint                |
 rc                     | numeric               |
 drc                    | numeric               |
 domain_id              | character varying(20) |
 vocabulary_id          | character varying(20) |
 concept_class_id       | character varying(20) |
 standard_concept       | character varying(1)  |
Indexes:
    "rc_dom" btree (domain_id, concept_id)
    "rcdom" btree (domain_id)
    "rcdomvocsc" btree (domain_id, vocabulary_id, standard_concept)

该表具有3,133,778条记录,因此Postgres不应因为表较小而忽略索引.

The table has 3,133,778 records, so Postgres shouldn't be ignoring the index because of small table size.

我对domain_id进行了过滤,该索引已建立索引,并且索引被忽略:

I filter on domain_id, which is indexed, and the index is ignored:

OHDSI=> explain select * from record_counts where domain_id = 'Drug';
                               QUERY PLAN
------------------------------------------------------------------------
 Seq Scan on record_counts  (cost=0.00..76744.81 rows=2079187 width=87)
   Filter: ((domain_id)::text = 'Drug'::text)

我关闭seqscan,然后:

I turn off seqscan and:

OHDSI=> set enable_seqscan=false;
SET
OHDSI=> explain select * from record_counts where domain_id = 'Drug';
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Bitmap Heap Scan on record_counts  (cost=42042.13..105605.97 rows=2079187 width=87)
   Recheck Cond: ((domain_id)::text = 'Drug'::text)
   ->  Bitmap Index Scan on rcdom  (cost=0.00..41522.33 rows=2079187 width=0)
         Index Cond: ((domain_id)::text = 'Drug'::text)

实际上,该计划表示使用索引比不使用索引要昂贵得多,但是为什么呢?如果索引允许它处理更少的记录,那么使用它应该不更快吗?

Indeed, the plan says it's going to be more expensive to use the index than not, but why? If the index lets it handle many fewer records, shouldn't it be quicker to use it?

推荐答案

好吧,看来Postgres知道自己在做什么.我正在使用的索引列的特殊值("Drug")恰好占表中行的66%.因此,是的,过滤器使行集明显变小,但是由于这些行将分散在页面之间,因此索引不允许更快地检索它们.

Ok, it looks like Postgres knew what it was doing. The particular value of the indexed column I was using ('Drug') happened to account for 66% of the rows in the table. So, yes, the filter makes the row set significantly smaller, but since those rows would be scattered between pages, the index doesn't allow them to be retrieved faster.

OHDSI=> select domain_id, count(*) as rows, round((100 * count(*)::float / 3133778.0)::numeric,4) pct from record_counts group by 1 order by 2 desc;
      domain_id      |  rows   |   pct
---------------------+---------+---------
 Drug                | 2074991 | 66.2137
 Condition           |  466882 | 14.8984
 Observation         |  217807 |  6.9503
 Procedure           |  165800 |  5.2907
 Measurement         |  127239 |  4.0602
 Device              |   29410 |  0.9385
 Spec Anatomic Site  |   28783 |  0.9185
 Meas Value          |   10415 |  0.3323
 Unit                |    2350 |  0.0750
 Type Concept        |    2170 |  0.0692
 Provider Specialty  |    1957 |  0.0624
 Specimen            |    1767 |  0.0564
 Metadata            |    1689 |  0.0539
 Revenue Code        |     538 |  0.0172
 Place of Service    |     480 |  0.0153
 Race                |     467 |  0.0149
 Relationship        |     242 |  0.0077
 Condition/Obs       |     182 |  0.0058
 Currency            |     180 |  0.0057
 Condition/Meas      |     115 |  0.0037
 Route               |      81 |  0.0026
 Obs/Procedure       |      78 |  0.0025
 Condition/Device    |      52 |  0.0017
 Condition/Procedure |      25 |  0.0008
 Meas/Procedure      |      25 |  0.0008
 Gender              |      19 |  0.0006
 Device/Procedure    |       9 |  0.0003
 Meas Value Operator |       9 |  0.0003
 Visit               |       8 |  0.0003
 Drug/Procedure      |       3 |  0.0001
 Spec Disease Status |       3 |  0.0001
 Ethnicity           |       2 |  0.0001

当我在where子句中使用任何其他值(包括条件",具有15%的行)时,Postgres会使用索引.

When I use any other value in the where clause (including 'Condition', with 15% of the rows), Postgres does use the index.

(令人惊讶的是,即使我基于domain_id索引对表进行了聚类,当我对"Drug"进行过滤时,它仍然不使用该索引,但是过滤掉34%的行的性能并未提高似乎值得进一步追求.)

(Somewhat surprisingly, even after I cluster the table based on the domain_id index, it still doesn't use the index when I filter on 'Drug', but the performance improvement for filtering out 34% of the rows doesn't seem worth pursuing this further.)

这篇关于postgres不使用索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆