Postgresql 9.x:用于优化xpath_exists(XMLEXISTS)查询的索引 [英] Postgresql 9.x: Index to optimize `xpath_exists` (XMLEXISTS) queries

查看:117
本文介绍了Postgresql 9.x:用于优化xpath_exists(XMLEXISTS)查询的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有以下形式的查询

select sum(acol)
where xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol)

什么索引可以可以加快where子句的构建速度?

What index can be built to speed up the where clause ?

使用

create index idx_01 using btree(xpath_exists('/Root/KeyValue[Key="val"]/Value//text()', xmlcol))

似乎根本没有使用。

编辑

enable_seqscan 设置为 off ,使用 xpath_exists 更快(一个数量级),并且可以使用相应的索引(使用 xpath_exists 构建的btree索引)清楚地显示。

Setting enable_seqscan to off, the query using xpath_exists is much faster (one order of magnitude) and clearly shows using the corresponding index (the btree index built with xpath_exists).

任何线索为何PostgreSQL不使用索引并尝试慢得多的顺序扫描?

Any clue why PostgreSQL would not be using the index and attempt a much slower sequential scan ?

因为我不想要全局禁用顺序扫描,我回到第一个方框

Since I do not want to disable sequential scanning globally, I am back to square one and I am happily welcoming suggestions.

编辑2-说明计划

请参阅下文-第一个计划的费用(关闭seqscan)略高,但处理时间很多更快

See below - Cost of first plan (seqscan off) is slightly higher but processing time much faster

b2box=# set enable_seqscan=off;
SET
b2box=# explain analyze
Select count(*) 
from B2HEAD.item
where cluster = 'B2BOX' and (  ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) )  )  offset 0 limit 1;
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.042..606.042 rows=1 loops=1)
   ->  Aggregate  (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.039..606.039 rows=1 loops=1)
         ->  Bitmap Heap Scan on item  (cost=1058.65..22701.38 rows=26102 width=0) (actual time=3.290..603.823 rows=4085 loops=1)
               Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
               ->  Bitmap Index Scan on item_counter_01  (cost=0.00..1052.13 rows=56515 width=0) (actual time=2.283..2.283 rows=4085 loops=1)
                     Index Cond: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) = true)
 Total runtime: 606.136 ms
(7 rows)

计划在explain.depesz.com上

b2box=# set enable_seqscan=on;
SET
b2box=# explain analyze
Select count(*) 
from B2HEAD.item
where cluster = 'B2BOX' and (  ( xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()', content) )  )  offset 0 limit 1;
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.163..10864.163 rows=1 loops=1)
   ->  Aggregate  (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.160..10864.160 rows=1 loops=1)
         ->  Seq Scan on item  (cost=0.00..22490.45 rows=26102 width=0) (actual time=33.574..10861.672 rows=4085 loops=1)
               Filter: (xpath_exists('/MessageInfo[FinalRecipient="ABigBank"]//text()'::text, content, '{}'::text[]) AND ((cluster)::text = 'B2BOX'::text))
               Rows Removed by Filter: 108945
 Total runtime: 10864.242 ms
(6 rows)

对explain.depesz.com的计划

推荐答案

计划者成本参数



Planner cost parameters


第一个计划的成本(关闭seqscan)稍高,但是处理时间快得多

Cost of first plan (seqscan off) is slightly higher but processing time much faster

这告诉我,您的 random_page_cost seq_page_cost 可能是错误的。您可能使用快速随机I / O进行存储-是因为大多数数据库都缓存在RAM中,或者是因为您使用的是SSD,带有缓存的SAN或其他本质上是随机I / O较快的存储。

This tells me that your random_page_cost and seq_page_cost are probably wrong. You're likely on storage with fast random I/O - either because most of the database is cached in RAM or because you're using SSD, SAN with cache, or other storage where random I/O is inherently fast.

尝试:

SET random_page_cost = 1;
SET seq_page_cost = 1.1;

可以大大降低成本参数差异,然后重新运行。如果那样做的话,请考虑更改 postgresql.conf中的那些参数。

to greatly reduce the cost param differences and then re-run. If that does the job consider changing those params in postgresql.conf..

您的行数估算值是合理的,因此它看起来不像是计划者的错误估计问题或表格统计数据有问题。

Your row-count estimates are reasonable, so it doesn't look like a planner mis-estimation problem or a problem with bad table statistics.

您的查询也不正确。没有 ORDER BY OFFSET 0 LIMIT 1 将产生不可预测的结果,除非您保证完全匹配在这种情况下, OFFSET ... LIMIT ... 子句是不必要的,可以完全删除。

Your query is also incorrect. OFFSET 0 LIMIT 1 without an ORDER BY will produce unpredictable results unless you're guaranteed to have exactly one match, in which case the OFFSET ... LIMIT ... clauses are unnecessary and can be removed entirely.

您使用诸如 SELECT max(...) SELECT min(...)这样的查询通常会好得多在可能的情况; PostgreSQL将倾向于能够使用索引来提取所需的值,而无需执行昂贵的表扫描或索引扫描和排序。

You're usually much better off phrasing such queries as SELECT max(...) or SELECT min(...) where possible; PostgreSQL will tend to be able to use an index to just pluck off the desired value without doing an expensive table scan or an index scan and sort.

BTW,对于将来的问题,PostgreSQL Wiki在性能类别以及提出慢查询问题的指南

BTW, for future questions the PostgreSQL wiki has some good information in the performance category and a guide to asking Slow query questions.

这篇关于Postgresql 9.x:用于优化xpath_exists(XMLEXISTS)查询的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆