几乎相同的 XQuery (exist-db) 查询在速度上存在巨大差异 [英] Massive differences in speed with virtually identical XQuery (exist-db) interogations

查看:22
本文介绍了几乎相同的 XQuery (exist-db) 查询在速度上存在巨大差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在过去的几周里,我一直在移植一个 SQL 数据库,尽管到目前为止我已经克服了所有以前的障碍,但我现在遇到了一些官方文档、在线搜索、常识也无济于事.

I've been porting an SQL database over to exist over the last few weeks, and while I so far managed to get over all previous hindrances, I've now run into something for which neither the official documentation, searching online, nor common sense was able to help.

简而言之,我有一个相当大的集合(大约 90k 个条目,分布在 20 个文件中),大多数单个条目看起来像这样(这是一个巨大的简化,所以我可以理解这一点):

In short, I have a pretty big collection (about 90k entries, spread over 20 files), with most individual entries looking something like this (this is a massive simplification so I can get the point across):

<document>
 <document_id>Numerical Value</document_id>
 <page_id>Some other numerical value</page_id>
</document>

然后我通过 php 传递一个值,我们称之为 $val奇怪的是,当我运行标准查询时

I then pass a value through php, lets call it $val The strange part is that when I run the standard query

$p in collection("/db/folder_location")//documentset/document[xs:integer(document_id) eq $val]

无论我传递什么值,它都会在几秒钟内返回所有结果.但是,如果我稍微修改它,使其成为:

No matter what value I pass, it returns all the results in a matter of seconds. If I slightly modify it, however, making it:

$p in collection("/db/folder_location")//documentset/document[xs:integer(page_id) eq $val]

它要么需要超过 30 秒的时间来返回值,要么只是保持锁定在正在运行的查询中并且从不返回任何内容.在我已经转换的所有 30 个查询中,这是我唯一一次遇到这个问题并且找不到解决方法.

It either takes over 30 seconds to return the values or simply stays locked in a running query and never returns anything. Of all the 30 queries I already converted, this is the only time I ran into this problem and could not find a workaround.

推荐答案

为了解决查询性能问题,我建议您对查询进行一些更改和/或在 document_idpage_id 元素.

To address the query performance problem, I would suggest some changes to your query and/or the addition of a range index on the document_id and page_id elements.

您的查询将所有 document_idpage_id 元素转换为 xs:integer.考虑到大型数据集,这是一种低效的操作.考虑 (a) 移除此类型转换,(b) 反转它(将 $val 转换为 xs:string),或 (c) 在这两个元素上添加范围索引, 使用 type="xs:integer".后一个选项将允许您删除谓词中的强制转换(允许您将其更改为 document[document_id eq $val]document[page_id eq $val]),并且索引应该会大大加快查找速度.

Your query casts all document_id and page_id elements as xs:integer. This is an inefficient operation given a large dataset. Consider (a) removing this type casting, (b) reversing it (cast $val as xs:string), or (c) adding a range index on these two elements, with type="xs:integer". This latter option will let you remove the casting in your predicate (allowing you to change it to document[document_id eq $val] and document[page_id eq $val]), and the index should greatly speed the lookup.

要为您的查询添加范围索引,请像这样创建一个集合配置文档:

To add a range index for your query, create a collection configuration document like this:

<collection xmlns="http://exist-db.org/collection-config/1.0">
    <index xmlns:xs="http://www.w3.org/2001/XMLSchema">
        <range>
            <create qname="document_id" type="xs:integer"/>
            <create qname="page_id" type="xs:integer"/>
        </range>
    </index>
</collection>

如果您的数据在 /db/folder_location 中,则将此文档作为 collection.xconf 存储在 /db/system/config/db/folder_location.然后使用 xmldb:reindex("/db/system/config/db/folder_location") 重新索引您的集合.正如范围索引文档所述,使用此索引定义:

If your data is in /db/folder_location, then store this document as collection.xconf in /db/system/config/db/folder_location. Then reindex your collection with xmldb:reindex("/db/system/config/db/folder_location"). As the documentation on range indexes states, with this index definition in place:

索引将自动用于一般或值比较以及字符串函数,如 fn:containsfn:starts-withfn:ends-与.

indexes will be used automatically for general or value comparisons as well as string functions like fn:contains, fn:starts-with, fn:ends-with.

有关 eXist 中范围索引的更多信息,请参阅 https://exist-db.org/exist/apps/doc/newrangeindex.xml.有关查询优化技术,请参阅 https://exist-db.org/exist/应用程序/doc/tuning.xml.有关 eXist 中的一般索引,请参阅 https://exist-db.org/存在/apps/doc/indexing.xml.

For more on range indexes in eXist, see https://exist-db.org/exist/apps/doc/newrangeindex.xml. For query optimization techniques, see https://exist-db.org/exist/apps/doc/tuning.xml. For indexes in general in eXist, see https://exist-db.org/exist/apps/doc/indexing.xml.

这篇关于几乎相同的 XQuery (exist-db) 查询在速度上存在巨大差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆