如何以块为单位处理大量文档,以避免扩展的树形缓存已满 [英] How to process large number of documents in chunk to avoid expanded tree cache full

查看:96
本文介绍了如何以块为单位处理大量文档,以避免扩展的树形缓存已满的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在MarkLogic中有一个实体,该实体下大约存在98k个文档(/someEntity/[ID].xml),而在一种情况下,我必须在所有这些文档中添加一些新标签.

I have one entity in MarkLogic under which around 98k+ documents (/someEntity/[ID].xml) are present and I have one situation in which I have to add a few new tags in all those documents.

我准备了一个查询来添加子节点,然后尝试对接收扩展树缓存已满的实体运行.我将高速缓存的内存增加到了几个演出,并且它需要花很长时间才能完成.还尝试了xdmp:clear-expanded-tree-cache(),它也无法正常工作.

I prepared a query to do add child node and then try to run against that entity receiving expanded tree cache full. I increased cache memory to few more gigs and it works and takes a long time to complete. Also tried with xdmp:clear-expanded-tree-cache() and it also won't work.

任何指针都说明我们如何以10k的块大小获取URL并进行处理,以使它不会增加内存,也不会在查询处理一段时间后引发错误.

Any pointers how we can fetch the URL's in the chunks of 10k and process so it won't spike the memory and won't throw an error after some time of query processing.

推荐答案

击中展开的树缓存听起来就像您将整个结果集保存在某个地方,这听起来是不必要的.可能有一些方法可以使您的代码更智能,并使其流经结果,并尽早忘记一切.根据经验,不要将完整的结果集分配给let语句.

Hitting expanded tree cache sounds like you are holding the full result set somewhere, which sounds unnecessary. There might be ways to make your code smarter, and have it stream through the results, and forget about things as soon as possible. As a rule of thumb for this: don't assign complete result sets to let statements.

但是,有时将工作分批处理会更容易.迈克尔·加德纳(Michael Gardner)建议的 Corb 是一个很好的选择.它可以从外部限制MarkLogic上的负载,并在需要时降低负载.

However, sometimes it is easier to just batch up the work. Corb, as suggested by Michael Gardner is an excellent choice for this. It can throttle the load on MarkLogic from outside, and pace it down if needed.

对于像这样的较小任务,类似 taskbot 也可以解决问题很难控制它的速度.

For smaller tasks like this something like taskbot might do the trick as well, though it is harder to control its pace.

HTH!

这篇关于如何以块为单位处理大量文档,以避免扩展的树形缓存已满的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆