如何从Solr索引中删除逻辑删除的文档? [英] How do I remove logically deleted documents from a Solr index?

查看:144
本文介绍了如何从Solr索引中删除逻辑删除的文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在实现Solr,以便为一个项目进行免费文本搜索,其中每天需要大量添加和删除可搜索的记录.

I am implementing Solr for a free text search for a project where the records available to be searched will need to be added and deleted on a large scale every day.

由于规模,我需要确保索引的大小合适.

Because of the scale I need to make sure that the size of the index is appropriate.

在Solr的测试安装中,我索引了一组10个文档.然后,我对其中一个文档进行更改,并希望用索引中的相同ID替换该文档.这样可以正常工作,并且在我搜索时表现出预期的效果.

On my test installation of Solr, I index a set of 10 documents. Then I make a change in one of the document and want to replace the document with the same ID in the index. This works correctly and behaves as expected when I search.

我正在使用以下代码来更新文档:

I am using this code to update the document:

getSolrServer().deleteById(document.getIndexId());
getSolrServer().add(document.getSolrInputDocument());
getSolrServer().commit();

但是我注意到的是,当我查看Solr服务器的统计信息页面时,这些数字并不是我期望的.

What I noticed though is that when I look at the stats page for the Solr server that the figures are not what I expect.

在初始索引之后,numDocs和maxDocs都等于10.但是,当我更新文档时,numDocs仍然等于10(预期),但maxDocs等于11(意外).

After the initial index, numDocs and maxDocs both equal 10 as expected. When I update the document however, numDocs is still equal to 10 (expected) but maxDocs equals 11 (unexpected).

在阅读文档时,我看到了

When reading the documentation I see that

maxDoc可能会更大,因为maxDoc计数包含尚未从索引中删除的逻辑删除的文档.

maxDoc may be larger as the maxDoc count includes logically deleted documents that have not yet been removed from the index.

问题是,如何从索引中删除逻辑删除的文档?

So the question is, how do I remove logically deleted documents from the index?

如果这些文档仍然存在于索引中,当运行大量文档时,是否会有性能损失的风险?

If these documents still exist in the index do I run the risk of performance penalties when this is run with a very large volume of documents?

谢谢:)

推荐答案

您必须优化索引.

请注意,优化是可扩展的,您可能不应该每天进行一次优化.

Note that an optimize is expansive, you probably should not do it more than daily.

以下是有关优化的更多信息:

Here is some more info on optimize:

http://www.lucidimagination.com/search/document/CDRG_ch06_6. 3.1.3

http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations

这篇关于如何从Solr索引中删除逻辑删除的文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆