为什么PostgreSQL（9.1）不使用索引进行简单的相等选择？ [英] Why is PostgreSQL (9.1) not using index for simple equality select?

查看：125 发布时间：2018/8/2 15:23:32 postgresql indexing postgresql-9.1

本文介绍了为什么PostgreSQL（9.1）不使用索引进行简单的相等选择？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的桌子线索有一个索引：

\d lead
...
Indexes:
    "lead_pkey" PRIMARY KEY, btree (id)
    "lead_account__c" btree (account__c)
    ...
    "lead_email" btree (email)
    "lead_id_prefix" btree (id text_pattern_ops)

为什么不'PG（9.1）使用索引进行这种直接的平等选择？电子邮件几乎都是独一无二的....

Why doesn't PG (9.1) use the index for this straightforward equality selection? Emails are almost all unique....

db=> explain select * from lead where email = 'blah';
                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on lead  (cost=0.00..319599.38 rows=1 width=5108)
   Filter: (email = 'blah'::text)
(2 rows)

其他索引查询似乎没问题（虽然我不知道为什么这个不只是使用pkey索引）：

Other index-hitting queries seem to be OK (though I don't know why this one doesn't just use the pkey index):

db=> explain select * from lead where id = '';
                                  QUERY PLAN
------------------------------------------------------------------------------
 Index Scan using lead_id_prefix on lead  (cost=0.00..8.57 rows=1 width=5108)
   Index Cond: (id = ''::text)
(2 rows)

db=> explain select * from lead where account__c = '';
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Index Scan using lead_account__c on lead  (cost=0.00..201.05 rows=49 width=5108)
   Index Cond: (account__c = ''::text)
(2 rows)

起初我认为可能是由于电子邮件的分辨率不够。例如，如果统计数据声称电子邮件对于大多数表来说是 blah ，那么seq扫描会更快。但事实并非如此：

At first I thought it may be due to not enough distinct values of email. For instance, if the stats claim that email is blah for most of the table, then a seq scan is faster. But that's not the case:

db=> select count(*), count(distinct email) from lead;
 count  | count
--------+--------
 749148 | 733416
(1 row)

即使我强制关闭seq扫描，计划程序也是如此表现得好像没有其他选择：

Even if I force seq scans to be off, the planner behaves as if it has no other choice:

db=> set enable_seqscan = off;
SET
db=> show enable_seqscan;
 enable_seqscan
----------------
 off
(1 row)

db=> explain select * from lead where email = 'foo@blah.com';
                            QUERY PLAN
---------------------------------------------------------------------------
 Seq Scan on lead  (cost=10000000000.00..10000319599.38 rows=1 width=5108)
   Filter: (email = 'foo@blah.com'::text)
(2 rows)

还试过 EXPLAIN ANALYZE ：

db=> explain analyze select * from lead where email = 'foo@blah.com';
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on lead  (cost=10000000000.00..10000319732.76 rows=1 width=5102) (actual time=77845.244..77845.244 rows=0 loops=1)
   Filter: (email = 'foo@blah.com'::text)
 Total runtime: 77857.215 ms
(3 rows)

这是 \ d 输出（抱歉，必须隐藏列名称，并裁剪以符合SO的限制;请参阅 http://pastebin.com/ve3gzJpY ）：

Here is the \d output (sorry, have to obscure the column names, and cropped to fit in SO's limits; see uncropped version at http://pastebin.com/ve3gzJpY):

                                 Table "lead"
                   Column                   |            Type             | Modifiers 
--------------------------------------------+-----------------------------+-----------
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean                     | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 ...
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 email                                      | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean                     | 
 ...
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 account__c                                 | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 ...
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 id                                         | text                        | not null
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real                        | 
 ...
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | timestamp without time zone | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real                        | 
Indexes:
    "lead_pkey" PRIMARY KEY, btree (id)
    "lead_account__c" btree (account__c)
    "lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
    "lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
    "lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
    "lead_email" btree (email)
    "lead_id_prefix" btree (id text_pattern_ops)

这是 pg_dump --schema-only -t lead （再次参见未加载的 http://pastebin.com/ve3gzJpY ，具有独特的列名，以防有助于重现性）：

Here is pg_dump --schema-only -t lead (again see uncropped at http://pastebin.com/ve3gzJpY, with unique column names as well in case it helps reproducibility):

--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;

SET default_tablespace = '';

SET default_with_oids = false;

--
-- Name: lead; Type: TABLE; Schema: public; Owner: pod; Tablespace: 
--

CREATE TABLE lead (
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX boolean,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX date,
    ...
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    account__c text,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    ...
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    id text NOT NULL,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
    ...
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX timestamp without time zone,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real
);


ALTER TABLE lead OWNER TO pod;

--
-- Name: lead_pkey; Type: CONSTRAINT; Schema: public; Owner: pod; Tablespace: 
--

ALTER TABLE ONLY lead
    ADD CONSTRAINT lead_pkey PRIMARY KEY (id);


--
-- Name: lead_account__c; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_account__c ON lead USING btree (account__c);


--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);


--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);


--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);


--
-- Name: lead_email; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_email ON lead USING btree (email);


--
-- Name: lead_id_prefix; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_id_prefix ON lead USING btree (id text_pattern_ops);


--
-- PostgreSQL database dump complete
--

一些PG目录咒语：

db=> select * from pg_index where indexrelid = 'lead_email'::regclass;
 indexrelid | indrelid  | indnatts | indisunique | indisprimary | indisexclusion | indimmediate | indisclustered | indisvalid | indcheckxmin | indisready | indkey | indcollation | indclass | indoption | indexprs | indpred
------------+-----------+----------+-------------+--------------+----------------+--------------+----------------+------------+--------------+------------+--------+--------------+----------+-----------+----------+---------
  215251995 | 101034456 |        1 | f           | f            | f              | t            | f              | t          | t            | t          | 101    | 100          | 10043    | 0         | ¤        | ¤
(1 row)

一些地区信息：

db=> show lc_collate;
 lc_collate  
-------------
 en_US.UTF-8
(1 row)

db=> show lc_ctype;
  lc_ctype   
-------------
 en_US.UTF-8
(1 row)

我搜索了大量过去的SO问题，但没有一个关于像这样的简单相等查询。

I searched over a good number of past SO questions but none were about a simple equality query like this one.

为什么PostgreSQL（9.1）不使用索引进行简单的相等选择？ [英] Why is PostgreSQL (9.1) not using index for simple equality select?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么PostgreSQL（9.1）不使用索引进行简单的相等选择？ [英] Why is PostgreSQL (9.1) not using index for simple equality select?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭