使用*通配符时,Lucene .net Boost无法正常工作 [英] Lucene .net Boost not working when using * wildcard

查看:118
本文介绍了使用*通配符时,Lucene .net Boost无法正常工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个文档,并使用Luke进行调查,我已经使用StandardAnalyzer在代码中确认了它具有相同的行为.

I have two documents and using Luke to investigate, I have confirmed in code that it has the same behavior, using StandardAnalyzer.

第一个文档带有增强功能1

Document one with boost 1

stored/uncompressed,indexed,tokenized<Description:Nummer ett>
stored/uncompressed,indexed,tokenized<Id:2>
stored/uncompressed,indexed,tokenized<Name:Apa>

文档2具有增强功能2

Document two with boost 2

stored/uncompressed,indexed,tokenized<Description:Nummer två>
stored/uncompressed,indexed,tokenized<Id:1>
stored/uncompressed,indexed,tokenized<Name:Apa>

在字段名称中搜索apa 使用正确的升压顺序返回.

Search apa in field Name Returns with boost used and in the correct order.

Document 2 has Score 1,1891
Document 1 has Score 0.5945

搜索ap * 无序返回且得分相同

Search ap* Returns in no order and same score

Document 1 Score 1.0000
Document 2 Score 1.0000

搜索apa * 无序返回且得分相同

Search apa* Returns in no order and same score

Document 1 Score 1.0000
Document 2 Score 1.0000

这是为什么?即使我必须使用通配符,我也想返回一些具有更高提升值的文档.这可能吗?

Why is this? I would like to return some documents with higher boost value even if I have to use wildcards. Is this possible?

欢呼所有酷炫的程序员!

Cheers all cool coders out there!

这就是我的同谋.

一个搜索字符串,想要匹配.使用通配符. 搜索"Lu" +"*"

A search string and want matches. Using wildcard. Search "Lu" +"*"

Document
 Name
 City

例如,我希望名称为隆德的文档比具有名称为隆特或城市为隆德的文档获得更高的评级.这是由于我会知道哪些文档最受欢迎.我想获得斯德哥尔摩市的文件,并分别命名为Stockholm和Stockholmen,但是我选择了订购.

I would like the Document whose Name is Lund to get higher rating than the document with the Name Lunt or City is Lund for example. This is due to I will know which documents that are most popular. I want to get the documents with city Stockholm and names Stockholm and Stockholmen but ordered as I choose.

推荐答案

由于WildcardQueryMultiTermQuery的子类,因此得到的恒定分数为1.

Since WildcardQuery is a subclass of MultiTermQuery you are getting constant score of 1.

如果您检查t.getBoost()的定义:

t.getBoost()是查询q中项t的搜索时间提升,例如 在查询文本中指定(请参阅查询语法),或由 应用程序调用setBoost().请注意,实际上没有直接 用于在多词查询中访问一个词的增强词的API,但是 相当多的术语在查询中表示为multi TermQuery 对象,因此查询中术语的提升可通过以下方式访问 调用子查询getBoost()

t.getBoost() is a search time boost of term t in the query q as specified in the query text (see query syntax), or as set by application calls to setBoost(). Notice that there is really no direct API for accessing a boost of one term in a multi term query, but rather multi terms are represented in a query as multi TermQuery objects, and so the boost of a term in the query is accessible by calling the sub-query getBoost()

http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/core/org/apache/lucene/search/Similarity.html#formula_termBoost

一种可能的破解方法是设置查询解析器的重写方法:

One possible hack could be to set rewrite method of query parser:

myCustomQueryParser.SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)

这篇关于使用*通配符时,Lucene .net Boost无法正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆