Lucene .Net优化过程 [英] Lucene .Net optimization process

查看:82
本文介绍了Lucene .Net优化过程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Lucene .Net 2.9.2创建索引.经过大量索引后,索引具有许多段和已删除的文档,因此我在IndexWriter上调用Optimize(numSegmets).

I am creating an index using Lucene .Net 2.9.2. After a lot of indexing, the index has many segments and deleted documents, so I am calling Optimize(numSegmets) on the IndexWriter.

索引的段数确实减少到了 numSegmets 的值,但是它仍然具有删除功能...对Optimize的调用还应该删除所有已删除的文档吗?

The index's segments count is indeed reduced to the value of numSegmets, but it still has deletions... doesnt a call to Optimize should also remove all deleted documents?

我的问题非常重要,因此我可以知道这是Lucene的工作方式,还是我有一些错误...

My question is very important so I could know if this is how Lucene works or maybe I have some bug...

这是我的代码段:

IndexWriter writer = new IndexWriter(/*open writer from index directroy*/);
writer.Optimize(5);
writer.Commit();

bool hasDeletions = writer.HasDeletions();

hasDeletions 是正确的,而我原以为它会是错误的...

hasDeletions is true, while I was expecting it would be false...

推荐答案

除非您提供 1 作为细分的最大数量,否则删除将一直保留.

Deletions can remain unless you provide 1 as the maximum number of segments.

但是您不必为此担心.引用 IndexWriter#在Lucene 3.5中进行优化

But you shouldn't worry about this. To quote the documentation for IndexWriter#optimize in Lucene 3.5

此方法已被弃用,因为它效率极低且极少合理. Lucene的多段搜索性能随着时间的推移而有所提高,并且默认的TieredMergePolicy现在以删除的段为目标.

This method has been deprecated, as it is horribly inefficient and very rarely justified. Lucene's multi-segment search performance has improved over time, and the default TieredMergePolicy now targets segments with deletions.

这篇关于Lucene .Net优化过程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆