在大型数据库上使用谓词? [英] Using predicates on a large database?

查看:76
本文介绍了在大型数据库上使用谓词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个50,000,000个文档数据库,我想将每个文档的base-uri写入文件.运行整个50,000,000太长时间了(查询超时).因此,我认为我会使用谓词将数据库分为更易于管理的批处理.因此,我尝试了以下方法来了解其性能:

I have a 50,000,000 document database that I'd like to write to a file the base-uri's for each document. Running the entire 50,000,000 is too long running (query times out). So, I thought I'd use predicates to break the database into more manageable batches. So, I tried the following to get a handle on its performance:

for $i in ( 49999000 to 50000000 )
return fn:base-uri( /mainDoc[position()=$i] )

但是,这1000个基本uri的性能非常慢.实际上,查询超时.我尝试了类似的查询并获得了相似的结果(或缺少结果):

But, performance was very slow for these 1000 base uris. In fact, the query timed out. I tried a similar query and got similar results (or lack of results):

for $i in ( /mainDoc ) [ 49999000 to 50000000 ]
return fn:base-uri( $i ) 

是否有更高效的方法遍历大型数据库,使数据库末尾的文档获取速度与数据库开头的文档一样快?

Is there a more performant method of looping through a large database, where documents at the end of the database are equally as quick to obtain as those at the beginning of the database?

推荐答案

如果只需要文档URI,就很容易.确保已启用文档词典,然后运行cts:uris()调用.

If you just want the document URIs, that easy. Ensure you have the document lexicon enabled and run a cts:uris() call.

要按照您的方法在文档列表中跳转以对每个文档进行操作,您可以不加筛选地进行工作以使其快速完成:

To follow your approach to jump ahead in a document list to do something with each document, you can do the work unfiltered to make it fast:

for $item in cts:search(/mainDoc, cts:and-query(()), "unfiltered")[49999000 to 5000000]
return base-uri($item)

cts:and-query(())是传递始终为真的查询的快捷方式.

The cts:and-query(()) is a shortcut way to pass an always-true query.

这篇关于在大型数据库上使用谓词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆