带有docIds的Lucene过滤器 [英] Lucene filter with docIds
问题描述
我现在如何实现它的方式是使用QueryWrapperFilter和一个BooleanQuery匹配每个候选文档的唯一id字段。但是,这意味着我必须先为每个候选文档调用IndexSearcher.doc()。get(docId),然后才能将其添加到我的BooleanQuery中,这是主要的瓶颈。我只是通过MapFieldSelector(docId)加载docId字段。我想创建自己的Filter类,但是我不能使用内部的Lucene doc ID直接,因为它们是每段指定的。任何想法如何解决这个问题?解决方案
该字段(它可能已经是),并使用 FieldCache 以更快的速度检索docId,而不是在布尔查询中使用docIds,可以使用 TermsFilter 或 FieldCacheTermsFilter 。后面的文档描述了性能的权衡。
I'm trying to do the following: I want to create a set of candidates by querying each field separately and then adding the top k matches to this set. After I'm done with that, I need to run another query on this candidate set. The way how I implemented it right now is using a QueryWrapperFilter with a BooleanQuery that matches the unique id field of each candidate document. However, this means I have to call IndexSearcher.doc().get("docId") for each candidate document before I can add it to my BooleanQuery, which is the major bottleneck. I'm only loading the docId field via MapFieldSelector("docId).
I wanted to create my own Filter class, but I can't use the internal Lucene doc ids directly, because they are specified per segment. Any thoughts on how to approach this?
Instead of reading the stored docId, index the field (it probably already is) and use the FieldCache to retrieve docIds much faster. Then instead of using the docIds in a BooleanQuery, try using a TermsFilter or FieldCacheTermsFilter. The latter documentation describes the performance trade-offs.
这篇关于带有docIds的Lucene过滤器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!