如果某些文档的非重要字段已更新,我们可以再次停止爬网该文档吗? [英] can we stop crawling the document again if some of document's non important fields are updated?

查看:69
本文介绍了如果某些文档的非重要字段已更新,我们可以再次停止爬网该文档吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,

我们拥有的SP 2013生产环境中有800万个搜索索引文档,并且拥有两个专用的搜索服务器来运行所有组件,并实现了连续爬网以确保索引新鲜度.

We are having SP 2013 production environment with 8 million documents in search index and have two dedicated search servers for running all components and enabled continuous crawl for index freshness.

在应用程序端,我们为文档提供了一些自定义列表列,这些列捕获用户视图计数.因此,只要用户打开文档,并且连续不断地不必要地再次爬网文档,这些字段就会更新 爬行.

From application end, we have some custom list columns for the document, which captures the user views count. So  these fields will be updated whenever user opens the document and document is getting crawled again unnecessarily again through continuous crawl.

因此,更新导致文档被修改的字段,从而进行搜索爬网,然后将其添加到爬网队列中,这是我们真正不需要的.

So updating the fields causing document modified and hence search crawling is picking up and adding in the crawl queue, which we really don't need it.

如果某些字段(视图计数器)像这样更新,我们是否可以使用自定义爬网规则来禁止文档的爬网?

Can we have any option using custom crawl rules to disable the crawling of document if some of the fields (views counters) get updated like this?

请让我们知道任何指针.

Please let us know any pointers.

推荐答案

你好AK,

是的,在Search Service应用程序中,您可以排除要爬网的内容.

Yes, in Search Service application, you can exclude content to be crawled.

如前所述,您具有文档库的自定义列表列.

As you have mentioned, you have custom list columns for the documents library.

创建新的抓取规则

输入==> *://*/文档库或自定义列表" *

Avinash Shinde [MCP,Microsoft合作伙伴,MCTS]

Avinash Shinde [MCP, Microsoft Partner, MCTS]


这篇关于如果某些文档的非重要字段已更新,我们可以再次停止爬网该文档吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆