按 ElasticSearch 中的术语位置评分? [英] Scoring by term position in ElasticSearch?
问题描述
我正在 ElasticSearch 中实现自动完成索引,但遇到了排序/评分问题.假设我在索引中有以下字符串:
I'm implementing an auto-complete index in ElasticSearch and have run into an issue with sorting/scoring. Say I have the following strings in an index:
apple banana coconut donut
apple banana donut durian
apple donut coconut durian
donut banana coconut durian
当我搜索donut"时,我希望结果按位置这样的术语排序:
When I search for "donut", I want the results to be ordered by the term location like so:
donut banana coconut durian
apple donut coconut durian
apple banana donut durian
apple banana coconut donut
我不知道如何做到这一点.术语位置没有被计入默认评分逻辑,我无法找到一种方法来获得它.似乎是一个足够简单的问题,尽管其他人之前一定遇到过这个问题.有没有人想出来的?
I can't figure out how to make that happen. Term position isn't factored into the default scoring logic, and I can't find a way to get it in there. Seems like a simple enough issue though that others must have run into this before. Has anyone figured it out yet?
谢谢!
推荐答案
这是我最终得到的解决方案,基于 Andrei 的答案并扩展为支持多个搜索词和基于结果中第一个单词长度的附加评分:
Here's the solution I ended up with, based on Andrei's answer and expanded to support multiple search terms and additional scoring based on length of the first word in the result:
首先,定义以下自定义分析器(它将整个字符串保留为单个标记并将其小写):
First, define the following custom analyzer (it keeps the entire string as a single token and lowercases it):
"raw_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
其次,像这样定义您的搜索字段映射(我的名为name"):
Second, define your search field mapping like so (mine's named "name"):
"name": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index_analyzer": "raw_analyzer",
"search_analyzer": "standard"
}
}
},
"_nameFirstWordLength": {
"type": "long"
}
第三,在填充索引时使用以下逻辑(我的在 C# 中)来填充:
Third, when populating the index use the following logic (mine's in C#) to populate:
_nameFirstWordLength = fi.Name.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries)[0].Length
最后,按如下方式进行搜索:
Finally, do your search as follows:
{
"query":{
"bool":{
"must":{
"match_phrase_prefix":{
"name":{
"query":"apple"
}
}
},
"should":{
"function_score":{
"query":{
"query_string":{
"fields":[
"name.raw"
],
"query":"apple*"
}
},
"script_score":{
"script":"100/doc['_nameFirstWordLength'].value"
},
"boost_mode":"replace"
}
}
}
}
}
我正在使用 match_phrase_prefix 以便支持部分匹配,例如ap"匹配apple".bool must/should 对 name.raw 的第二个 query_string 查询为名称以搜索词之一开头的结果提供更高的分数(在我的代码中,我正在预处理搜索字符串,仅针对第二个查询,以在每个单词后添加*").最后,将第二个查询包装在使用 _nameFirstWordLength 值的 function_score 脚本中会导致由第二个查询评分的结果按第一个单词的长度进一步排序(例如,导致 Apple 显示在 Applebee 之前).
I'm using match_phrase_prefix so that partial matches are supported, such as "ap" matching "apple". The bool must/should with that second query_string query against name.raw gives a higher score to results whose name starts with one of the search terms (in my code I'm pre-processing the search string, just for that second query, to add a "*" after every word). Finally, wrapping that second query in a function_score script that uses the value of _nameFirstWordLength causes the results up-scored by the second query to be further sorted by the length of their first word (causing Apple to show before Applebee's, for example).
这篇关于按 ElasticSearch 中的术语位置评分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!