Postgresql 9.x:用于优化xpath_exists(XMLEXISTS)查询的索引 [英] Postgresql 9.x: Index to optimize `xpath_exists` (XMLEXISTS) queries
问题描述
我们有以下形式的查询
select sum(acol)
where xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol)
什么索引可以可以加快where子句的构建速度?
What index can be built to speed up the where clause ?
使用
create index idx_01 using btree(xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol))
似乎根本没有使用。
编辑
将 enable_seqscan
设置为 off
,使用 xpath_exists
更快(一个数量级),并且可以使用相应的索引(使用 xpath_exists
构建的btree索引)清楚地显示。
Setting enable_seqscan
to off
, the query using xpath_exists
is much faster (one order of magnitude) and clearly shows using the corresponding index (the btree index built with xpath_exists
).
任何线索为何PostgreSQL不使用索引并尝试慢得多的顺序扫描?
Any clue why PostgreSQL would not be using the index and attempt a much slower sequential scan ?
因为我不想要全局禁用顺序扫描,我回到第一个方框
Since I do not want to disable sequential scanning globally, I am back to square one and I am happily welcoming suggestions.
编辑2-说明计划
请参阅下文-第一个计划的费用(关闭seqscan)略高,但处理时间很多更快
See below - Cost of first plan (seqscan off) is slightly higher but processing time much faster
b2box=# set enable_seqscan=off;
SET
b2box=# explain analyze
Select count(*)
from B2HEAD.item
where cluster = 'B2BOX' and ( ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) ) ) offset 0 limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.042..606.042 rows=1 loops=1)
-> Aggregate (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.039..606.039 rows=1 loops=1)
-> Bitmap Heap Scan on item (cost=1058.65..22701.38 rows=26102 width=0) (actual time=3.290..603.823 rows=4085 loops=1)
Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
-> Bitmap Index Scan on item_counter_01 (cost=0.00..1052.13 rows=56515 width=0) (actual time=2.283..2.283 rows=4085 loops=1)
Index Cond: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) = true)
Total runtime: 606.136 ms
(7 rows)
b2box=# set enable_seqscan=on;
SET
b2box=# explain analyze
Select count(*)
from B2HEAD.item
where cluster = 'B2BOX' and ( ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) ) ) offset 0 limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.163..10864.163 rows=1 loops=1)
-> Aggregate (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.160..10864.160 rows=1 loops=1)
-> Seq Scan on item (cost=0.00..22490.45 rows=26102 width=0) (actual time=33.574..10861.672 rows=4085 loops=1)
Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
Rows Removed by Filter: 108945
Total runtime: 10864.242 ms
(6 rows)
推荐答案
计划者成本参数
Planner cost parameters
第一个计划的成本(关闭seqscan)稍高,但是处理时间快得多
Cost of first plan (seqscan off) is slightly higher but processing time much faster
这告诉我,您的 random_page_cost
和 seq_page_cost
可能是错误的。您可能使用快速随机I / O进行存储-是因为大多数数据库都缓存在RAM中,或者是因为您使用的是SSD,带有缓存的SAN或其他本质上是随机I / O较快的存储。
This tells me that your random_page_cost
and seq_page_cost
are probably wrong. You're likely on storage with fast random I/O - either because most of the database is cached in RAM or because you're using SSD, SAN with cache, or other storage where random I/O is inherently fast.
尝试:
SET random_page_cost = 1;
SET seq_page_cost = 1.1;
可以大大降低成本参数差异,然后重新运行。如果那样做的话,请考虑更改 postgresql.conf中的那些参数。
。
to greatly reduce the cost param differences and then re-run. If that does the job consider changing those params in postgresql.conf.
.
您的行数估算值是合理的,因此它看起来不像是计划者的错误估计问题或表格统计数据有问题。
Your row-count estimates are reasonable, so it doesn't look like a planner mis-estimation problem or a problem with bad table statistics.
您的查询也不正确。没有 ORDER BY
的 OFFSET 0 LIMIT 1
将产生不可预测的结果,除非您保证完全匹配在这种情况下, OFFSET ... LIMIT ...
子句是不必要的,可以完全删除。
Your query is also incorrect. OFFSET 0 LIMIT 1
without an ORDER BY
will produce unpredictable results unless you're guaranteed to have exactly one match, in which case the OFFSET ... LIMIT ...
clauses are unnecessary and can be removed entirely.
您使用诸如 SELECT max(...)
或 SELECT min(...)
这样的查询通常会好得多在可能的情况; PostgreSQL将倾向于能够使用索引来提取所需的值,而无需执行昂贵的表扫描或索引扫描和排序。
You're usually much better off phrasing such queries as SELECT max(...)
or SELECT min(...)
where possible; PostgreSQL will tend to be able to use an index to just pluck off the desired value without doing an expensive table scan or an index scan and sort.
BTW,对于将来的问题,PostgreSQL Wiki在性能类别以及提出慢查询问题的指南。
BTW, for future questions the PostgreSQL wiki has some good information in the performance category and a guide to asking Slow query questions.
这篇关于Postgresql 9.x:用于优化xpath_exists(XMLEXISTS)查询的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!