在大型数据库上使用谓词? [英] Using predicates on a large database?
问题描述
我有一个50,000,000个文档数据库,我想将每个文档的base-uri写入文件.运行整个50,000,000太长时间了(查询超时).因此,我认为我会使用谓词将数据库分为更易于管理的批处理.因此,我尝试了以下方法来了解其性能:
I have a 50,000,000 document database that I'd like to write to a file the base-uri's for each document. Running the entire 50,000,000 is too long running (query times out). So, I thought I'd use predicates to break the database into more manageable batches. So, I tried the following to get a handle on its performance:
for $i in ( 49999000 to 50000000 )
return fn:base-uri( /mainDoc[position()=$i] )
但是,这1000个基本uri的性能非常慢.实际上,查询超时.我尝试了类似的查询并获得了相似的结果(或缺少结果):
But, performance was very slow for these 1000 base uris. In fact, the query timed out. I tried a similar query and got similar results (or lack of results):
for $i in ( /mainDoc ) [ 49999000 to 50000000 ]
return fn:base-uri( $i )
是否有更高效的方法遍历大型数据库,使数据库末尾的文档获取速度与数据库开头的文档一样快?
Is there a more performant method of looping through a large database, where documents at the end of the database are equally as quick to obtain as those at the beginning of the database?
推荐答案
如果只需要文档URI,就很容易.确保已启用文档词典,然后运行cts:uris()
调用.
If you just want the document URIs, that easy. Ensure you have the document lexicon enabled and run a cts:uris()
call.
要按照您的方法在文档列表中跳转以对每个文档进行操作,您可以不加筛选地进行工作以使其快速完成:
To follow your approach to jump ahead in a document list to do something with each document, you can do the work unfiltered to make it fast:
for $item in cts:search(/mainDoc, cts:and-query(()), "unfiltered")[49999000 to 5000000]
return base-uri($item)
cts:and-query(())
是传递始终为真的查询的快捷方式.
The cts:and-query(())
is a shortcut way to pass an always-true query.
这篇关于在大型数据库上使用谓词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!