按弹性搜索中的术语排名得分? [英] Scoring by term position in ElasticSearch?
问题描述
我正在ElasticSearch中实现自动完成索引,并且遇到排序/评分问题。说我在索引中有以下字符串:
苹果香蕉椰子甜甜圈
苹果香蕉甜甜圈榴莲
苹果甜甜圈椰子榴莲
甜甜圈香蕉椰子榴莲
当我搜索甜甜圈,我希望结果按照这样的位置排序:
甜甜圈香蕉椰子榴莲
苹果甜甜圈椰子榴莲
苹果香蕉甜甜圈榴莲
苹果香蕉椰子甜甜圈
我无法弄清楚如何做到这一点。术语位置不被考虑到默认评分逻辑中,我无法找到一种方法来获取它。似乎就像一个简单的问题,尽管其他人必须遇到这个问题。有没有人想出来?
谢谢!
根据Andrei的答案,我最终得出的解决方案是,根据结果中第一个字的长度,扩展到支持多个搜索字词和额外的得分:
首先,定义以下自定义分析器(将整个字符串保留为单个令牌并将其更低):
raw_analyzer:{
type:custom,
filter:[
smallcase
],
tokenizer:keyword
}
其次,定义您的搜索字段映射(我的名称为名称):
name:{
type:string,
analyzer:english,
fields:{
raw:{
type:string,
index_analyzer:raw_analyzer,
search_analyzer
}
}
},
_nameFirstWordLength:{
type:long
}
第三,填充索引时使用以下逻辑(我的C#中)填充:
_nameFirstWordLength = fi.Name.Split(new [] {''},StringSplitOptions.RemoveEmptyEntries)[0] .Length
最后,按如下所示进行搜索:
{
query:{
bool :{
must:{
match_phrase_prefix:{
name:{
query:apple
}
}
},
should:{
function_score:{
query:{
query_string:{
:[
name.raw
],
查询:苹果*
}
},
script_
脚本:100 / doc ['_ nameFirstWordLength']。 }
}
}
}
我使用的是match_phrase_prefix以便支持部分匹配,例如ap匹配苹果。该bool必须/应该与第二个query_string查询对name.raw给一个更高的分数结果,其名称以其中一个搜索条件开头(在我的代码中我预先处理搜索字符串,只是第二个查询,到在每个单词之后添加一个*)。最后,在使用_nameFirstWordLength值的一个function_score脚本中包装第二个查询,导致第二个查询得到的结果进一步按照第一个单词的长度进行排序(例如,导致Apple在Applebee之前显示)。 / p>
I'm implementing an auto-complete index in ElasticSearch and have run into an issue with sorting/scoring. Say I have the following strings in an index:
apple banana coconut donut
apple banana donut durian
apple donut coconut durian
donut banana coconut durian
When I search for "donut", I want the results to be ordered by the term location like so:
donut banana coconut durian
apple donut coconut durian
apple banana donut durian
apple banana coconut donut
I can't figure out how to make that happen. Term position isn't factored into the default scoring logic, and I can't find a way to get it in there. Seems like a simple enough issue though that others must have run into this before. Has anyone figured it out yet?
Thanks!
Here's the solution I ended up with, based on Andrei's answer and expanded to support multiple search terms and additional scoring based on length of the first word in the result:
First, define the following custom analyzer (it keeps the entire string as a single token and lowercases it):
"raw_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
Second, define your search field mapping like so (mine's named "name"):
"name": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index_analyzer": "raw_analyzer",
"search_analyzer": "standard"
}
}
},
"_nameFirstWordLength": {
"type": "long"
}
Third, when populating the index use the following logic (mine's in C#) to populate:
_nameFirstWordLength = fi.Name.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries)[0].Length
Finally, do your search as follows:
{
"query":{
"bool":{
"must":{
"match_phrase_prefix":{
"name":{
"query":"apple"
}
}
},
"should":{
"function_score":{
"query":{
"query_string":{
"fields":[
"name.raw"
],
"query":"apple*"
}
},
"script_score":{
"script":"100/doc['_nameFirstWordLength'].value"
},
"boost_mode":"replace"
}
}
}
}
}
I'm using match_phrase_prefix so that partial matches are supported, such as "ap" matching "apple". The bool must/should with that second query_string query against name.raw gives a higher score to results whose name starts with one of the search terms (in my code I'm pre-processing the search string, just for that second query, to add a "*" after every word). Finally, wrapping that second query in a function_score script that uses the value of _nameFirstWordLength causes the results up-scored by the second query to be further sorted by the length of their first word (causing Apple to show before Applebee's, for example).
这篇关于按弹性搜索中的术语排名得分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!