Lucene:完全禁用加权，得分，排名， [英] Lucene: Completely disable weighting, scoring, ranking,

查看：99 发布时间：2020/5/4 7:40:59 lucene

本文介绍了Lucene:完全禁用加权，得分，排名，的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Lucene构建令牌共现的大索引(例如[elephant,animal]，[melon,fruit]，[bmw,car]等).我使用BooleanQuery来查询索引中的这些同时出现，以获取绝对计数，这两个标记在我的索引中同时出现的频率是这样的:

I'm using Lucene to build a big index of token co-occurences (e.g. [elephant,animal], [melon,fruit], [bmw,car], ...). I query the index for those co-occurences using a BooleanQuery to get an absolute count, how often those two tokens co-occured in my index like so:

// search for documents which contain word+category
BooleanQuery query = new BooleanQuery();
query.add(new TermQuery(new Term("word", word)), Occur.MUST);
query.add(new TermQuery(new Term("category", category)), Occur.MUST);
// only care about the total number of hits
TotalHitCountCollector collector = new TotalHitCountCollector();
searcher.search(query, collector);
int count = collector.getTotalHits();

这些查询非常频繁地运行，我目前对性能不满意.我发现方法BooleanQuery#createWeight需要很多时间.现在，我不需要对结果进行任何评分或排名，因为我只对绝对文件计数感兴趣.

These queries run very frequently and I'm currently not satisfied with performance. I discovered, that the method BooleanQuery#createWeight takes a lot of time. Now, I do not need any scoring or ranking of my results, as I'm interested in absolut documents counts only.

是否有一种方便的方法(例如，预先存在的类)完全禁用计分和加权?如果没有，是否有任何提示需要针对用例扩展哪些类?

Is there a convenient way (pre-existing class e.g.) to completely disable scoring and weighting? If not, are there any hints which classes I need to extend for my use case?

推荐答案

我不太确定它是否会绕过得分方式，从而获得您想要的性能提升，但是一种简单的方法来应用常数得分就是将查询包装在 ConstantScoreQuery ，例如:

I'm not quite sure if it will bypass scoring in such a way as to get the performance increase you are looking for, but an easy way to apply a constant score would be to wrap the query in a ConstantScoreQuery, like:

BooleanQuery bq = new BooleanQuery();
//etc.
ConstantScoreQuery query = new ConstantScoreQuery(bq);
searcher.search(query, collector);

但是，我强烈建议使用Filter.过滤器不仅会绕过得分，而且还会缓存结果，因此，尤其是您的类别"字段似乎将是一个很好的选择.第一次使用过滤器查询类别时，将需要更长的时间，因为它需要为该过滤器建立缓存，但是此后，您应该会看到速度有了很大的提高.看看 FieldCacheTermsFilter .

I would, however, strongly recommend making use of Filters. Not only do filters bypass score, they also cache their results, so your "category" field, particularly, seems like it would be a very good place for this. The first time you query in a category using a filter, it will take longer as it needs to build the cache for that filter, but after that, you should see a very significant increase in speed. Take a look at the FieldCacheTermsFilter.

赞:

Query query = new TermQuery(new Term("word", word));
Filter filter = new FieldCacheTermsFilter("category", category);
TotalHitCountCollector collector = new TotalHitCountCollector();
searcher.search(query, filter, collector);
int count = collector.getTotalHits();

这篇关于Lucene:完全禁用加权，得分，排名，的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Lucene:完全禁用加权，得分，排名， [英] Lucene: Completely disable weighting, scoring, ranking,

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Lucene:完全禁用加权，得分，排名， [英] Lucene: Completely disable weighting, scoring, ranking,

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭