无法使用索引? [英] Unable to use index?

查看:51
本文介绍了无法使用索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨大家好!


我正在运行的查询似乎没有使用可用的索引

(运行版本7.4.2)。我有下表:


=> \d复制

表public.replicated

列|类型|

修饰符

----------------- + ------------- ------------- + ------------------------------------ -----------------

rep_id | bigint | not null default nextval(''replicated_id_seq'':: text)

rep_component |字符变化(100)|

rep_key1 |整数|

rep_key2 | bigint |

rep_key3 | smallint |

rep_replicated |带时区的时间戳|

rep_remotekey1 |整数|

rep_remotekey2 | bigint |

rep_remotekey3 | smallint |

rep_key2b | bigint |

rep_remotekey2b | bigint |

rep_key4 |文字|

索引:

" replicated_pkey"主键,btree(rep_id)

" replicate_key1_idx" btree(rep_key1,rep_key2,rep_key3)

" replicated_item2_idx" btree(rep_component,rep_key2,rep_key3)

" replicated_item_idx" btree(rep_component,rep_key1,rep_key2,rep_key3)

" replicated_key2_idx" btree(rep_key2,rep_key3)

" replicated_key4_idx" btree(rep_key4)


=>分析详细复制;

信息:分析public.replicated

信息:复制:362140页,30000行抽样,45953418估计

总行数

分析


以下不使用索引,即使两个可用于

特定选择rep_component。


=>解释分析select * from replicated where rep_component =''ps_probe''limit 1;

QUERY PLAN

--------------- -------------------------------------------------- -------------------------------------------------- ------------

限制(成本= 0.00..0.23行= 1宽度= 101)(实际时间= 34401.857..34401.859行= 1个循环= 1)

- >复制时的Seq扫描(成本= 0.00..936557.70行= 4114363宽度= 101)(实际时间= 34401.849..34401.849行= 1个循环= 1)

过滤器:((rep_component):: text =''ps_probe'':: text)

总运行时间:34401.925 ms

(4行)


但是,如果我执行以下操作,将使用索引,并且它运行得更快

更快(即使我交换了执行顺序)。


=> ;解释分析select * from replicated where rep_component =''ps_probe''order by rep_component limit 1;

QUERY PLAN

------------ -------------------------------------------------- -------------------------------------------------- ---------------------------------------

限制(费用= 0.00..1.66行= 1宽度= 101)(实际时间= 51.163..51.165行= 1个循环= 1)

- >在复制时使用replicated_item2_idx进行索引扫描(成本= 0.00..6838123.76行= 4114363宽度= 101)(实际时间= 51.157..51.157行= 1个循环= 1)

索引条件:((rep_component) :: text =''ps_probe'':: text)

总运行时间:51.265毫秒

(4行)


没有选择指数的原因是什么?也许我需要增加为统计抽样的

行的数量?


问候!

Ed


---------------------------(播出结束)------------ ---------------

提示2:您可以使用取消注册命令一次性取消所有列表

(发送"取消注册YourEmailAddressHere以 ma*******@postgresql.org

解决方案

2004年4月29日星期四09:48:10 -0400(美国东部时间),Edmund Dengler

< ed *** **@eSentire.com>写道:

=>解释分析select * from replicated where rep_component =''ps_probe''limit 1;
----------------------------- -------------------------------------------------- ------------------------------------------------ <无线电通信限制(成本= 0.00..0.23行= 1宽度= 101)(实际时间= 34401.857..34401.859行= 1个循环= 1)
- >复制时的Seq扫描(成本= 0.00..936557.70行= 4114363宽度= 101)(实际时间= 34401.849..34401.849行= 1个循环= 1)
^^^^过滤器:((rep_component):: text =''ps_probe'':: text)


计划员认为seq扫描的启动成本为0.00,即

它可以返回立即获得第一个元组,在过滤条件存在的情况下,这显然不是真正的b $ b。不幸的是,没有

简单的方法可以解决这个问题,因为统计信息没有关于具有特定vaules的元组物理位置的信息。

=>解释分析select * from replicated where rep_component =''ps_probe''order by rep_component limit 1;


这是一个很好的解决方法。它使seq扫描的计划看起来像


|限制(成本= 2345679.00..2345679.20行= 1宽度= 101)

| - >排序(2345678.90..2500000.00行= 4114363宽度= 101)

| - > Seq Scan on replicated(成本= 0.00..936557.70行= 4114363宽度= 101)

|过滤:((rep_component):: text =''ps_probe'':: text)


这是对索引扫描的输家:

限制( cost = 0.00..1.66 rows = 1 width = 101)(实际时间= 51.163..51.165行= 1个循环= 1)
也许我需要增加为统计抽样的行数?




没有帮助,恕我直言。


Servus

Manfred


---------------------------(广播结束)--------------- ------------

提示6:您是否搜索了我们的列表档案?

http://archives.postgresql.org


On Thu,29 2004年4月09:48:10 -0400(美国东部时间),Edmund Dengler

< ed ***** @ eSentire.com>写道:

=>解释分析select * from replicated where rep_component =''ps_probe''limit 1;
----------------------------- -------------------------------------------------- ------------------------------------------------ <无线电通信限制(成本= 0.00..0.23行= 1宽度= 101)(实际时间= 34401.857..34401.859行= 1个循环= 1)
- >复制时的Seq扫描(成本= 0.00..936557.70行= 4114363宽度= 101)(实际时间= 34401.849..34401.849行= 1个循环= 1)
^^^^过滤器:((rep_component):: text =''ps_probe'':: text)


计划员认为seq扫描的启动成本为0.00,即

它可以返回立即获得第一个元组,在过滤条件存在的情况下,这显然不是真正的b $ b。不幸的是,没有

简单的方法可以解决这个问题,因为统计信息没有关于具有特定vaules的元组物理位置的信息。

=>解释分析select * from replicated where rep_component =''ps_probe''order by rep_component limit 1;


这是一个很好的解决方法。它使seq扫描的计划看起来像


|限制(成本= 2345679.00..2345679.20行= 1宽度= 101)

| - >排序(2345678.90..2500000.00行= 4114363宽度= 101)

| - > Seq Scan on replicated(成本= 0.00..936557.70行= 4114363宽度= 101)

|过滤:((rep_component):: text =''ps_probe'':: text)


这是对索引扫描的输家:

限制( cost = 0.00..1.66 rows = 1 width = 101)(实际时间= 51.163..51.165行= 1个循环= 1)
也许我需要增加为统计抽样的行数?




没有帮助,恕我直言。


Servus

Manfred


---------------------------(广播结束)--------------- ------------

提示6:您是否搜索了我们的列表档案?

http://archives.postgresql.org


嗯,有趣的是我有那个表格从

rep_component开始聚集,所以''ps_probe''肯定会出现在后续的

扫描中。那么为什么< order by>强制使用索引?


问候!

Ed


2004年4月29日星期四,汤姆Lane写道:

Manfred Koizar< mk ***** @ aon.at>写道:

计划者认为seq扫描的启动成本为0.00,即它可以立即返回第一个元组,这在存在时显然不是真的过滤条件。



不是真的 - 在我们开始扫描结果之前,启动成本确实被定义为成本消耗
。选择N元组的估计成本实际上是startup_cost + N *(total_cost-startup_cost)/ M,其中M是返回的估计总行数。这就是为什么LIMIT
显示获取1行的成本的非零估计。

不幸的是,没有简单的方法可以解决这个问题,因为统计信息没有关于具有某些vaules的元组的物理位置的信息。



是的,我认为真正的问题是所需的行不是
均匀分布,实际上
表的开头附近没有。我们没有保持足够详细的统计数据让计划者发现这个,所以它必须根据均匀分布的假设进行估计。
在这个假设下,它看起来像一个seqscan会击中一个合适的元组
问候,tom lane




---------- -----------------(广播结束)---------------------------

提示3:如果通过Usenet发布/阅读,请发送适当的

subscribe-nomail命令到 ma ******* @ postgresql.org ,以便您的

消息可以干净地通过邮件列表


Hi folks!

A query I am running does not seem to use indexes that are available
(running version 7.4.2). I have the following table:

=> \d replicated
Table "public.replicated"
Column | Type |
Modifiers
-----------------+--------------------------+-----------------------------------------------------
rep_id | bigint | not null default nextval(''replicated_id_seq''::text)
rep_component | character varying(100) |
rep_key1 | integer |
rep_key2 | bigint |
rep_key3 | smallint |
rep_replicated | timestamp with time zone |
rep_remotekey1 | integer |
rep_remotekey2 | bigint |
rep_remotekey3 | smallint |
rep_key2b | bigint |
rep_remotekey2b | bigint |
rep_key4 | text |
Indexes:
"replicated_pkey" primary key, btree (rep_id)
"replicate_key1_idx" btree (rep_key1, rep_key2, rep_key3)
"replicated_item2_idx" btree (rep_component, rep_key2, rep_key3)
"replicated_item_idx" btree (rep_component, rep_key1, rep_key2, rep_key3)
"replicated_key2_idx" btree (rep_key2, rep_key3)
"replicated_key4_idx" btree (rep_key4)

=> analyze verbose replicated;
INFO: analyzing "public.replicated"
INFO: "replicated": 362140 pages, 30000 rows sampled, 45953418 estimated
total rows
ANALYZE

The following does not use an index, even though two are available for the
specific selection of rep_component.

=> explain analyze select * from replicated where rep_component = ''ps_probe'' limit 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.23 rows=1 width=101) (actual time=34401.857..34401.859 rows=1 loops=1)
-> Seq Scan on replicated (cost=0.00..936557.70 rows=4114363 width=101) (actual time=34401.849..34401.849 rows=1 loops=1)
Filter: ((rep_component)::text = ''ps_probe''::text)
Total runtime: 34401.925 ms
(4 rows)

Yet, if I do the following, an index will be used, and it runs much
faster (even when I swapped the order of the execution).

=> explain analyze select * from replicated where rep_component = ''ps_probe'' order by rep_component limit 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1.66 rows=1 width=101) (actual time=51.163..51.165 rows=1 loops=1)
-> Index Scan using replicated_item2_idx on replicated (cost=0.00..6838123.76 rows=4114363 width=101) (actual time=51.157..51.157 rows=1 loops=1)
Index Cond: ((rep_component)::text = ''ps_probe''::text)
Total runtime: 51.265 ms
(4 rows)

Any reason why the index is not chosen? Maybe I need to up the number of
rows sampled for statistics?

Regards!
Ed

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

解决方案

On Thu, 29 Apr 2004 09:48:10 -0400 (EDT), Edmund Dengler
<ed*****@eSentire.com> wrote:

=> explain analyze select * from replicated where rep_component = ''ps_probe'' limit 1;
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.23 rows=1 width=101) (actual time=34401.857..34401.859 rows=1 loops=1)
-> Seq Scan on replicated (cost=0.00..936557.70 rows=4114363 width=101) (actual time=34401.849..34401.849 rows=1 loops=1) ^^^^ Filter: ((rep_component)::text = ''ps_probe''::text)
The planner thinks that the seq scan has a startup cost of 0.00, i.e.
that it can return the first tuple immediately, which is obviously not
true in the presence of a filter condition. Unfortunately there''s no
easy way to fix this, because the statistics information does not have
information about the physical position of tuples with certain vaules.
=> explain analyze select * from replicated where rep_component = ''ps_probe'' order by rep_component limit 1;
This is a good workaround. It makes the plan for a seq scan look like

| Limit (cost=2345679.00..2345679.20 rows=1 width=101)
| -> Sort (2345678.90..2500000.00 rows=4114363 width=101)
| -> Seq Scan on replicated (cost=0.00..936557.70 rows=4114363 width=101)
| Filter: ((rep_component)::text = ''ps_probe''::text)

which is a loser against the index scan:
Limit (cost=0.00..1.66 rows=1 width=101) (actual time=51.163..51.165 rows=1 loops=1) Maybe I need to up the number of rows sampled for statistics?



Won''t help, IMHO.

Servus
Manfred

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


On Thu, 29 Apr 2004 09:48:10 -0400 (EDT), Edmund Dengler
<ed*****@eSentire.com> wrote:

=> explain analyze select * from replicated where rep_component = ''ps_probe'' limit 1;
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.23 rows=1 width=101) (actual time=34401.857..34401.859 rows=1 loops=1)
-> Seq Scan on replicated (cost=0.00..936557.70 rows=4114363 width=101) (actual time=34401.849..34401.849 rows=1 loops=1) ^^^^ Filter: ((rep_component)::text = ''ps_probe''::text)
The planner thinks that the seq scan has a startup cost of 0.00, i.e.
that it can return the first tuple immediately, which is obviously not
true in the presence of a filter condition. Unfortunately there''s no
easy way to fix this, because the statistics information does not have
information about the physical position of tuples with certain vaules.
=> explain analyze select * from replicated where rep_component = ''ps_probe'' order by rep_component limit 1;
This is a good workaround. It makes the plan for a seq scan look like

| Limit (cost=2345679.00..2345679.20 rows=1 width=101)
| -> Sort (2345678.90..2500000.00 rows=4114363 width=101)
| -> Seq Scan on replicated (cost=0.00..936557.70 rows=4114363 width=101)
| Filter: ((rep_component)::text = ''ps_probe''::text)

which is a loser against the index scan:
Limit (cost=0.00..1.66 rows=1 width=101) (actual time=51.163..51.165 rows=1 loops=1) Maybe I need to up the number of rows sampled for statistics?



Won''t help, IMHO.

Servus
Manfred

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Hmm, interesting as I have that table clustered starting with the
rep_component, so ''ps_probe'' will definitely appear later in a sequential
scan. So why does the <order by> force the use of the index?

Regards!
Ed

On Thu, 29 Apr 2004, Tom Lane wrote:

Manfred Koizar <mk*****@aon.at> writes:

The planner thinks that the seq scan has a startup cost of 0.00, i.e.
that it can return the first tuple immediately, which is obviously not
true in the presence of a filter condition.



Not really --- the startup cost is really defined as "cost expended
before we can start scanning for results". The estimated cost to select
N tuples is actually "startup_cost + N*(total_cost-startup_cost)/M",
where M is the estimated total rows returned. This is why the LIMIT
shows a nonzero estimate for the cost to fetch 1 row.

Unfortunately there''s no
easy way to fix this, because the statistics information does not have
information about the physical position of tuples with certain vaules.



Yeah, I think the real problem is that the desired rows are not
uniformly distributed, and in fact there are none near the start of the
table. We do not keep stats detailed enough to let the planner discover
this, so it has to estimate on the assumption of uniform distribution.
On that assumption, it looks like a seqscan will hit a suitable tuple
quickly enough to be faster than using the index.

regards, tom lane



---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly


这篇关于无法使用索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆