为什么PostgreSQL(9.1)不使用索引进行简单的相等选择? [英] Why is PostgreSQL (9.1) not using index for simple equality select?

查看:125
本文介绍了为什么PostgreSQL(9.1)不使用索引进行简单的相等选择?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的桌子线索有一个索引:

\d lead
...
Indexes:
    "lead_pkey" PRIMARY KEY, btree (id)
    "lead_account__c" btree (account__c)
    ...
    "lead_email" btree (email)
    "lead_id_prefix" btree (id text_pattern_ops)

为什么不'PG(9.1)使用索引进行这种直接的平等选择?电子邮件几乎都是独一无二的....

Why doesn't PG (9.1) use the index for this straightforward equality selection? Emails are almost all unique....

db=> explain select * from lead where email = 'blah';
                         QUERY PLAN
------------------------------------------------------------
 Seq Scan on lead  (cost=0.00..319599.38 rows=1 width=5108)
   Filter: (email = 'blah'::text)
(2 rows)

其他索引查询似乎没问题(虽然我不知道为什么这个不只是使用pkey索引):

Other index-hitting queries seem to be OK (though I don't know why this one doesn't just use the pkey index):

db=> explain select * from lead where id = '';
                                  QUERY PLAN
------------------------------------------------------------------------------
 Index Scan using lead_id_prefix on lead  (cost=0.00..8.57 rows=1 width=5108)
   Index Cond: (id = ''::text)
(2 rows)

db=> explain select * from lead where account__c = '';
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Index Scan using lead_account__c on lead  (cost=0.00..201.05 rows=49 width=5108)
   Index Cond: (account__c = ''::text)
(2 rows)

起初我认为可能是由于电子邮件的分辨率不够。例如,如果统计数据声称电子邮件对于大多数表来说是 blah ,那么seq扫描会更快。但事实并非如此:

At first I thought it may be due to not enough distinct values of email. For instance, if the stats claim that email is blah for most of the table, then a seq scan is faster. But that's not the case:

db=> select count(*), count(distinct email) from lead;
 count  | count
--------+--------
 749148 | 733416
(1 row)

即使我强制关闭seq扫描,计划程序也是如此表现得好像没有其他选择:

Even if I force seq scans to be off, the planner behaves as if it has no other choice:

db=> set enable_seqscan = off;
SET
db=> show enable_seqscan;
 enable_seqscan
----------------
 off
(1 row)

db=> explain select * from lead where email = 'foo@blah.com';
                            QUERY PLAN
---------------------------------------------------------------------------
 Seq Scan on lead  (cost=10000000000.00..10000319599.38 rows=1 width=5108)
   Filter: (email = 'foo@blah.com'::text)
(2 rows)

还试过 EXPLAIN ANALYZE

db=> explain analyze select * from lead where email = 'foo@blah.com';
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on lead  (cost=10000000000.00..10000319732.76 rows=1 width=5102) (actual time=77845.244..77845.244 rows=0 loops=1)
   Filter: (email = 'foo@blah.com'::text)
 Total runtime: 77857.215 ms
(3 rows)

这是 \ d 输出(抱歉,必须隐藏列名称,并裁剪以符合SO的限制;请参阅 http://pastebin.com/ve3gzJpY ):

Here is the \d output (sorry, have to obscure the column names, and cropped to fit in SO's limits; see uncropped version at http://pastebin.com/ve3gzJpY):

                                 Table "lead"
                   Column                   |            Type             | Modifiers 
--------------------------------------------+-----------------------------+-----------
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean                     | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 ...
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 email                                      | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | boolean                     | 
 ...
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 account__c                                 | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 ...
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | text                        | 
 id                                         | text                        | not null
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real                        | 
 ...
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | timestamp without time zone | 
 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | real                        | 
Indexes:
    "lead_pkey" PRIMARY KEY, btree (id)
    "lead_account__c" btree (account__c)
    "lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
    "lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
    "lead_XXXXXXXXXXXXXXXXXXXXXX" btree (XXXXXXXXXXXXXXXXXXXXXX)
    "lead_email" btree (email)
    "lead_id_prefix" btree (id text_pattern_ops)

这是 pg_dump --schema-only -t lead (再次参见未加载的 http://pastebin.com/ve3gzJpY ,具有独特的列名,以防有助于重现性):

Here is pg_dump --schema-only -t lead (again see uncropped at http://pastebin.com/ve3gzJpY, with unique column names as well in case it helps reproducibility):

--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;

SET default_tablespace = '';

SET default_with_oids = false;

--
-- Name: lead; Type: TABLE; Schema: public; Owner: pod; Tablespace: 
--

CREATE TABLE lead (
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX boolean,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX date,
    ...
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    account__c text,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    ...
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX text,
    id text NOT NULL,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real,
    ...
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX timestamp without time zone,
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX real
);


ALTER TABLE lead OWNER TO pod;

--
-- Name: lead_pkey; Type: CONSTRAINT; Schema: public; Owner: pod; Tablespace: 
--

ALTER TABLE ONLY lead
    ADD CONSTRAINT lead_pkey PRIMARY KEY (id);


--
-- Name: lead_account__c; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_account__c ON lead USING btree (account__c);


--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);


--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);


--
-- Name: lead_XXXXXXXXXXXXXXXXXXXX; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_XXXXXXXXXXXXXXXXXXXX ON lead USING btree (XXXXXXXXXXXXXXXXXXXX);


--
-- Name: lead_email; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_email ON lead USING btree (email);


--
-- Name: lead_id_prefix; Type: INDEX; Schema: public; Owner: pod; Tablespace: 
--

CREATE INDEX lead_id_prefix ON lead USING btree (id text_pattern_ops);


--
-- PostgreSQL database dump complete
--

一些PG目录咒语:

db=> select * from pg_index where indexrelid = 'lead_email'::regclass;
 indexrelid | indrelid  | indnatts | indisunique | indisprimary | indisexclusion | indimmediate | indisclustered | indisvalid | indcheckxmin | indisready | indkey | indcollation | indclass | indoption | indexprs | indpred
------------+-----------+----------+-------------+--------------+----------------+--------------+----------------+------------+--------------+------------+--------+--------------+----------+-----------+----------+---------
  215251995 | 101034456 |        1 | f           | f            | f              | t            | f              | t          | t            | t          | 101    | 100          | 10043    | 0         | ¤        | ¤
(1 row)

一些地区信息:

db=> show lc_collate;
 lc_collate  
-------------
 en_US.UTF-8
(1 row)

db=> show lc_ctype;
  lc_ctype   
-------------
 en_US.UTF-8
(1 row)

我搜索了大量过去的SO问题,但没有一个关于像这样的简单相等查询。

I searched over a good number of past SO questions but none were about a simple equality query like this one.

推荐答案

要解决这些问题,必须在故障排除步骤之间运行VACUUM ANALYZE以查看哪些有效。否则你可能不知道到底改变了什么。所以试试并再次运行,看看它是否解决了问题。

To troubleshoot these it is imperative that you run VACUUM ANALYSE on the table between troubleshooting steps to see what works. Otherwise you may not know exactly what changed where. So try that and run again first and see if it fixes the problem.

下一步运行(运行真空分析和每个之间的测试用例)是:

The next steps to run (run vacuum analyse and atest case between each one) are:

ALTER TABLE lead ALTER COLUMN email SET STATISTICS 1000;

也许这会解决它。也许没有。

Maybe that will fix it. Maybe not.

如果这不能解决问题,请仔细查看pg_stat视图:

If that doesn't fix it, take a close look at the pg_stat view:

SELECT * FROM pg_stat WHERE table_name = 'lead';

请仔细阅读以下内容,看看您在pg_stat中看到的是什么错误;

Please read the following thoroughly and see what you can see that is amiss in pg_stat;

http:// www。 postgresql.org/docs/9.0/static/planner-stats.html

编辑:非常清楚, vacuum analyze 不是整个故障排除。但是,必须在故障排除步骤之间运行,否则您无法确定规划人员是否考虑了正确的数据。

To be very clear, vacuum analyse is not the whole of the troubleshooting. However it MUST be run in between troubleshooting steps because otherwise you can't be sure the planner is taking into account correct data.

这篇关于为什么PostgreSQL(9.1)不使用索引进行简单的相等选择?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆