什么是最精确的匹配排名最好的lucene设置 [英] what is the best lucene setup for ranking exact matches as the highest

查看:200
本文介绍了什么是最精确的匹配排名最好的lucene设置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我希望精确匹配排名高于部分匹配时,应使用哪些分析器进行索引和搜索?可能在相似性类中设置自定义评分?

Which analyzers should be used for indexing and for searching when I want an exact match to rank higher then a "partial" match? Possibly set up custom scoring in a Similarity class?

例如,当我的索引包含<$ c $时c>汽车零件,汽车汽车商店(以<$ c $为索引) c> StandardAnalyzer on lucene 3.5),查询car会导致:

For example, when my index consist of car parts, car, and car shop (indexed with StandardAnalyzer on lucene 3.5), a query for "car" results in:


  • 汽车零件

  • 汽车

  • 汽车商店

(基本上按照添加顺序返回,因为它们都得到相同的分数)。

(basically returned in the order in which they were added, since they all get the same score).

我想要什么看是汽车排名第一,然后是其他结果(无论哪个顺序都没关系,我认为分析仪可以影响那个)。

What I would like to see is car ranked first, then the other results (doesn't really matter which order, I assume the analyzer can influence that).

推荐答案

所有三场比赛都是完全(赛车匹配,而不是'ca'或'ar'):)

All three matches are exact (term car being matched, not 'ca' or 'ar') :)

如果这些字段中没有更多内容(汽车部件,汽车和汽车商店),那么您可以使用 lengthNorm() computeNorm()(取决于Lucene版本),为更短的场地提供更多的重量,以便汽车获得更高的得分,因为更短。在Lucene 3.3.0中,DefaultSimilarity.computeNorm()如下所示:

If there's no more content in these fields ("car parts", "car" and "car shop"), then you could use lengthNorm() or computeNorm() (depending on Lucene version), to give shorter fields more weight so that car gets higher score for being shorter. In Lucene 3.3.0, DefaultSimilarity.computeNorm() looks like this:

return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));

其中 numTerms 是条款总数在该领域。因此,令人惊讶的汽车和汽车商店文件具有相同的分数,因为汽车的标准是1,而汽车商店它应该是0.7(假设增加1)。

where numTerms is the total number of terms in the field. So it's surprising "car" and "car shop" documents have the same score, because for "car" the norm is 1 and for "car shop" it should be 0.7 (assuming boost of 1).

这篇关于什么是最精确的匹配排名最好的lucene设置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆