Lucene:如何在几个独立的索引集上执行搜索并合并结果? [英] Lucene: How to perform search on several independent index sets and merge the result?
问题描述
现在,我有几个Lucene索引集(我称为分片),它们为不同的文档集建立索引.它们是独立的,这意味着我可以对它们中的每一个执行搜索,而无需阅读其他内容.然后我得到一个查询请求.我想在每个索引集上进行搜索,然后将结果合并以形成最终的顶级文档.
Now I have several Lucene index sets (I call it shards), which indexes different document sets. They are independent, which means I can perform search on each of them without reading others. Then I get a query request. I want to search it over every index set and combine the result to form the final top documents.
我知道,在对文档进行评分时,Lucene需要知道每个术语的< idf >;不同的索引集将赋予不同的< idf >同一术语(因为不同的索引集包含不同的文档集).因此,据我了解,我无法直接比较不同索引集的文档分数.那我应该如何产生最终结果呢?
I know that when scoring the documents, Lucene needs to know the <idf> of every term, and different index sets will give different <idf> to the same term (because different index sets hold different document sets). Thus to my understanding, I cannot compare the document score from different index sets directly. Then how should I generate the final result?
一个显而易见的解决方案是,首先合并索引,然后对大索引执行搜索.但是,这对我来说太耗时,因此是无法接受的.还有其他更好的解决方案吗?
An obvious solution would be first merge the index and then perform the search over the big index. However, this is tooo time-consuming for me and thus unacceptable. Anyone has other better solutions?
P.S .:除了Lucene和Hadoop,我不想使用任何软件包或软件(例如Katta).
P.S.: I don't want to use any packages or softwares (like Katta) except Lucene and Hadoop.
推荐答案
I think MultiReader is what you are looking for. If you have multiple IndexReaders, say reader1
and reader2
:
MultiReader multiReader = new MultiReader(reader1, reader2);
IndexSearcher searcher = new IndexSearcher(multiReader);
这篇关于Lucene:如何在几个独立的索引集上执行搜索并合并结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!