eXist-db/XSLT/Saxon collection() 像糖蜜一样慢(或因内存限制而出错) [英] eXist-db / XSLT / Saxon collection() slow as molasses (or errors out with memory limit)

查看:41
本文介绍了eXist-db/XSLT/Saxon collection() 像糖蜜一样慢(或因内存限制而出错)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自这个问题,我完全管理了一个从 eXist-db/Xquery 转换函数中加载的 XSLT 2.0 文档访问 eXist-DB collection() 的解决方案不令人满意:

Coming from this question, I managed one entirely unsatisfactory solution for accessing an eXist-DB collection() from an XSLT 2.0 document loaded from within an eXist-db/Xquery transformation function:

XSLT 文件声明了一个变量:

The XSLT file declares a variable :

 <xsl:variable name="coll" select="collection('xmldb:exist:///db/apps/deheresi/data/collection_ms609.xml')"/>

这指向我创建的目录 xml 文件(根据 撒克逊文档) 看起来像这样,以便加载实际集合:

This points to a catalog xml file I created (per Saxon documentation) that looks like this, in order to load the actual collection:

<collection stable="true">
  <doc href="xmldb:exist:///db/apps/deheresi/data/ms609_0001.xml"/>
  <doc href="xmldb:exist:///db/apps/deheresi/data/ms609_0002.xml"/>
  ...
  ...
  <doc href="xmldb:exist:///db/apps/deheresi/data/ms609_0709.xml"/>
  <doc href="xmldb:exist:///db/apps/deheresi/data/ms609_0710.xml"/>
</collection>

这允许 XSLT 文件使用需要搜索所有这些文件的密钥:

This allows the XSLT file to use a key that needs to search across all these files:

<xsl:key name="correspkey" match="tei:seg[@type='dep_event' and @corresp]" use="@corresp"/>

<xsl:variable name="correspvar" select="self::seg[@type='dep_event' and @corresp]/@corresp"/>

<xsl:value-of select="$coll/(key('correspid',$correspvar) except $correspvar)/@id" separator=", "/>

就目前而言,如果我在目录中有 50 个文档,我会在 2 分钟内得到结果;对于所有 710,我在 4 分钟后收到 java GC 错误.

As it stands, if I have 50 documents in the catalog, I get a result in 2 minutes; with all 710 I get a java GC error after 4 minutes.

我已经在 eXist-DB 中的相关节点上设置了索引,但这对性能没有任何影响.在我看来,Saxon 正在外部"eXist-DB 的优化工作,将 eXist-DB 视为一个简单的文件系统.

I have set indexes on relevant nodes in eXist-DB, but this does nothing to performance. It seems to me Saxon is working 'outside' eXist-DB's optimisations, treating eXist-DB as a simple file system.

(就其价值而言,设置 href="/db/apps/deheresi/data/ms609_0001.xml" 不会让 Saxon 看到文档.)

(For what it's worth, setting href="/db/apps/deheresi/data/ms609_0001.xml" does not let Saxon see the documents.)

我怀疑这就是为什么 eXist-DB 文档 不存在.

I suspect all of this is why the eXist-DB documentation is non-existent.

事实上,我正在寻找解决方案,以便从 Xquery transform() 加载到 eXist-DB 中的 XSLT 2.0 中对集合进行密集搜索.

As it goes, I am looking for solutions for intensive searches of collections from within XSLT 2.0 loaded within eXist-DB by Xquery transform().

如果有的话,我希望这篇文章能帮助未来遇到同样问题的搜索者.

If anything, I hope this post helps future searchers encountering the same problem.

推荐答案

一般的架构原则是:尽量让搜索更接近数据.在这种情况下,这意味着:使用 eXist 查找感兴趣的文档,不要从 eXist 中提取所有可能的候选文档,然后让 Saxon 进行搜索.在 eXist XQuery 中选择感兴趣的实际文档,然后在样式表参数中将这些文档的列表传递给 Saxon.

The general architectural principle is: try to move the searching closer to the data. In this case this means: use eXist to find the documents of interest, don't extract every possible candidate document from eXist and then ask Saxon to do the searching. Select the actual documents of interest in an eXist XQuery, and then pass the list of these documents to Saxon in a stylesheet parameter.

这篇关于eXist-db/XSLT/Saxon collection() 像糖蜜一样慢(或因内存限制而出错)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆