Lucene:如何在几个独立的索引集上执行搜索并合并结果? [英] Lucene: How to perform search on several independent index sets and merge the result?

查看:69
本文介绍了Lucene:如何在几个独立的索引集上执行搜索并合并结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在,我有几个Lucene索引集(我称为分片),它们为不同的文档集建立索引.它们是独立的,这意味着我可以对它们中的每一个执行搜索,而无需阅读其他内容.然后我得到一个查询请求.我想在每个索引集上进行搜索,然后将结果合并以形成最终的顶级文档.

Now I have several Lucene index sets (I call it shards), which indexes different document sets. They are independent, which means I can perform search on each of them without reading others. Then I get a query request. I want to search it over every index set and combine the result to form the final top documents.

我知道,在对文档进行评分时,Lucene需要知道每个术语的< idf >;不同的索引集将赋予不同的< idf >同一术语(因为不同的索引集包含不同的文档集).因此,据我了解,我无法直接比较不同索引集的文档分数.那我应该如何产生最终结果呢?

I know that when scoring the documents, Lucene needs to know the <idf> of every term, and different index sets will give different <idf> to the same term (because different index sets hold different document sets). Thus to my understanding, I cannot compare the document score from different index sets directly. Then how should I generate the final result?

一个显而易见的解决方案是,首先合并索引,然后对大索引执行搜索.但是,这对我来说太耗时,因此是无法接受的.还有其他更好的解决方案吗?

An obvious solution would be first merge the index and then perform the search over the big index. However, this is tooo time-consuming for me and thus unacceptable. Anyone has other better solutions?

P.S .:除了Lucene和Hadoop,我不想使用任何软件包或软件(例如Katta).

P.S.: I don't want to use any packages or softwares (like Katta) except Lucene and Hadoop.

推荐答案

我认为

I think MultiReader is what you are looking for. If you have multiple IndexReaders, say reader1 and reader2:

MultiReader multiReader = new MultiReader(reader1, reader2);
IndexSearcher searcher = new IndexSearcher(multiReader);

这篇关于Lucene:如何在几个独立的索引集上执行搜索并合并结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆