Lucene .Net优化过程 [英] Lucene .Net optimization process
问题描述
我正在使用Lucene .Net 2.9.2创建索引.经过大量索引后,索引具有许多段和已删除的文档,因此我在IndexWriter上调用Optimize(numSegmets).
I am creating an index using Lucene .Net 2.9.2. After a lot of indexing, the index has many segments and deleted documents, so I am calling Optimize(numSegmets) on the IndexWriter.
索引的段数确实减少到了 numSegmets 的值,但是它仍然具有删除功能...对Optimize的调用还应该删除所有已删除的文档吗?
The index's segments count is indeed reduced to the value of numSegmets, but it still has deletions... doesnt a call to Optimize should also remove all deleted documents?
我的问题非常重要,因此我可以知道这是Lucene的工作方式,还是我有一些错误...
My question is very important so I could know if this is how Lucene works or maybe I have some bug...
这是我的代码段:
IndexWriter writer = new IndexWriter(/*open writer from index directroy*/);
writer.Optimize(5);
writer.Commit();
bool hasDeletions = writer.HasDeletions();
hasDeletions 是正确的,而我原以为它会是错误的...
hasDeletions is true, while I was expecting it would be false...
推荐答案
除非您提供 1 作为细分的最大数量,否则删除将一直保留.
Deletions can remain unless you provide 1 as the maximum number of segments.
但是您不必为此担心.引用 IndexWriter#在Lucene 3.5中进行优化
But you shouldn't worry about this. To quote the documentation for IndexWriter#optimize in Lucene 3.5
此方法已被弃用,因为它效率极低且极少合理. Lucene的多段搜索性能随着时间的推移而有所提高,并且默认的TieredMergePolicy现在以删除的段为目标.
This method has been deprecated, as it is horribly inefficient and very rarely justified. Lucene's multi-segment search performance has improved over time, and the default TieredMergePolicy now targets segments with deletions.
这篇关于Lucene .Net优化过程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!