Lucene:如何在几个独立的索引集上执行搜索并合并结果? [英] Lucene: How to perform search on several independent index sets and merge the result?

查看：69 发布时间：2020/5/4 7:30:33 java search lucene search-engine

本文介绍了Lucene:如何在几个独立的索引集上执行搜索并合并结果?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

现在，我有几个Lucene索引集(我称为分片)，它们为不同的文档集建立索引.它们是独立的，这意味着我可以对它们中的每一个执行搜索，而无需阅读其他内容.然后我得到一个查询请求.我想在每个索引集上进行搜索，然后将结果合并以形成最终的顶级文档.

Now I have several Lucene index sets (I call it shards), which indexes different document sets. They are independent, which means I can perform search on each of them without reading others. Then I get a query request. I want to search it over every index set and combine the result to form the final top documents.

我知道，在对文档进行评分时，Lucene需要知道每个术语的< idf >；不同的索引集将赋予不同的< idf >同一术语(因为不同的索引集包含不同的文档集).因此，据我了解，我无法直接比较不同索引集的文档分数.那我应该如何产生最终结果呢?

I know that when scoring the documents, Lucene needs to know the <idf> of every term, and different index sets will give different <idf> to the same term (because different index sets hold different document sets). Thus to my understanding, I cannot compare the document score from different index sets directly. Then how should I generate the final result?

一个显而易见的解决方案是，首先合并索引，然后对大索引执行搜索.但是，这对我来说太耗时，因此是无法接受的.还有其他更好的解决方案吗?

An obvious solution would be first merge the index and then perform the search over the big index. However, this is tooo time-consuming for me and thus unacceptable. Anyone has other better solutions?

P.S .:除了Lucene和Hadoop，我不想使用任何软件包或软件(例如Katta).

P.S.: I don't want to use any packages or softwares (like Katta) except Lucene and Hadoop.

Lucene:如何在几个独立的索引集上执行搜索并合并结果? [英] Lucene: How to perform search on several independent index sets and merge the result?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Lucene:如何在几个独立的索引集上执行搜索并合并结果? [英] Lucene: How to perform search on several independent index sets and merge the result?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭