使用不同搜索空间大小的不同Lucene搜索结果 [英] Different lucene search results using different search space size

查看:89
本文介绍了使用不同搜索空间大小的不同Lucene搜索结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用lucene进行搜索的应用程序.搜索空间成千上万.用这数千种进行搜索,我只得到了一些结果,大约20个(这是可以预期的).

I have an application that uses lucene for searching. The search space are in the thousands. Searching against these thousands, I get only a few results, around 20 (which is ok and expected).

但是,当我将搜索空间减少到仅20个条目时(即,我仅对这20个条目建立了索引,而忽略了其他所有内容,因此开发起来会更容易),我得到的20个结果相同,但是顺序不同(和得分).

However, when I reduce my search space to just those 20 entries (i.e. I indexed only those 20 entries and disregard everything else...so that development would be easier), I get the same 20 results but in different order (and scoring).

我尝试通过Field#setOmitNorms(true)禁用规范因子,但还是得到了不同的结果?

I tried disabling the norm factors via Field#setOmitNorms(true), but I still get different results?

是什么导致得分差异?

谢谢

推荐答案

请参阅Lucene的

Please see the scoring documentation in Lucene's Similarity API. My bet is on the difference in idf between the two cases (both numDocs and docFreq are different). In order to know for sure, use the explain() function to debug the scores.

获取说明的代码片段:

TopDocs hits = searcher.search(query, searchFilter, max);
ScoreDoc[] scoreDocs = hits.scoreDocs;
for (ScoreDoc scoreDoc : scoreDocs) {
  String explanation = searcher.explain(query, scoreDoc.doc).toString();
  Log.debug(explanation);
}

这篇关于使用不同搜索空间大小的不同Lucene搜索结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆