按弹性搜索中的术语排名得分? [英] Scoring by term position in ElasticSearch?

查看:139
本文介绍了按弹性搜索中的术语排名得分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在ElasticSearch中实现自动完成索引,并且遇到排序/评分问题。说我在索引中有以下字符串:

 苹果香蕉椰子甜甜圈
苹果香蕉甜甜圈榴莲
苹果甜甜圈椰子榴莲
甜甜圈香蕉椰子榴莲

当我搜索甜甜圈,我希望结果按照这样的位置排序:

 甜甜圈香蕉椰子榴莲
苹果甜甜圈椰子榴莲
苹果香蕉甜甜圈榴莲
苹果香蕉椰子甜甜圈

我无法弄清楚如何做到这一点。术语位置不被考虑到默认评分逻辑中,我无法找到一种方法来获取它。似乎就像一个简单的问题,尽管其他人必须遇到这个问题。有没有人想出来?



谢谢!

解决方案

根据Andrei的答案,我最终得出的解决方案是,根据结果中第一个字的长度,扩展到支持多个搜索字词和额外的得分:



首先,定义以下自定义分析器(将整个字符串保留为单个令牌并将其更低):

 raw_analyzer:{
type:custom,
filter:[
smallcase
],
tokenizer:keyword
}

其次,定义您的搜索字段映射(我的名称为名称):

 name:{
type:string,
analyzer:english,
fields:{
raw:{
type:string,
index_analyzer:raw_analyzer,
search_analyzer
}
}
},
_nameFirstWordLength:{
type:long
}

第三,填充索引时使用以下逻辑(我的C#中)填充:

  _nameFirstWordLength = fi.Name.Split(new [] {''},StringSplitOptions.RemoveEmptyEntries)[0] .Length 

最后,按如下所示进行搜索:

  {
query:{
bool :{
must:{
match_phrase_prefix:{
name:{
query:apple
}
}
},
should:{
function_score:{
query:{
query_string:{
:[
name.raw
],
查询:苹果*
}
},
script_
脚本:100 / doc ['_ nameFirstWordLength']。 }
}
}
}

我使用的是match_phrase_prefix以便支持部分匹配,例如ap匹配苹果。该bool必须/应该与第二个query_string查询对name.raw给一个更高的分数结果,其名称以其中一个搜索条件开头(在我的代码中我预先处理搜索字符串,只是第二个查询,到在每个单词之后添加一个*)。最后,在使用_nameFirstWordLength值的一个function_score脚本中包装第二个查询,导致第二个查询得到的结果进一步按照第一个单词的长度进行排序(例如,导致Apple在Applebee之前显示)。 / p>

I'm implementing an auto-complete index in ElasticSearch and have run into an issue with sorting/scoring. Say I have the following strings in an index:

apple banana coconut donut
apple banana donut durian
apple donut coconut durian
donut banana coconut durian

When I search for "donut", I want the results to be ordered by the term location like so:

donut banana coconut durian
apple donut coconut durian
apple banana donut durian
apple banana coconut donut

I can't figure out how to make that happen. Term position isn't factored into the default scoring logic, and I can't find a way to get it in there. Seems like a simple enough issue though that others must have run into this before. Has anyone figured it out yet?

Thanks!

解决方案

Here's the solution I ended up with, based on Andrei's answer and expanded to support multiple search terms and additional scoring based on length of the first word in the result:

First, define the following custom analyzer (it keeps the entire string as a single token and lowercases it):

"raw_analyzer": {
    "type": "custom",
    "filter": [
        "lowercase"
    ],
    "tokenizer": "keyword"
}

Second, define your search field mapping like so (mine's named "name"):

"name": {
    "type": "string",
    "analyzer": "english",
    "fields": {
        "raw": {
            "type": "string",
            "index_analyzer": "raw_analyzer",
            "search_analyzer": "standard"
        }
    }
},
"_nameFirstWordLength": {
    "type": "long"
}

Third, when populating the index use the following logic (mine's in C#) to populate:

_nameFirstWordLength = fi.Name.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries)[0].Length

Finally, do your search as follows:

{
   "query":{
      "bool":{
         "must":{
            "match_phrase_prefix":{
               "name":{
                  "query":"apple"
               }
            }
         },
         "should":{
            "function_score":{
               "query":{
                  "query_string":{
                     "fields":[
                        "name.raw"
                     ],
                     "query":"apple*"
                  }
               },
               "script_score":{
                  "script":"100/doc['_nameFirstWordLength'].value"
               },
               "boost_mode":"replace"
            }
         }
      }
   }
}

I'm using match_phrase_prefix so that partial matches are supported, such as "ap" matching "apple". The bool must/should with that second query_string query against name.raw gives a higher score to results whose name starts with one of the search terms (in my code I'm pre-processing the search string, just for that second query, to add a "*" after every word). Finally, wrapping that second query in a function_score script that uses the value of _nameFirstWordLength causes the results up-scored by the second query to be further sorted by the length of their first word (causing Apple to show before Applebee's, for example).

这篇关于按弹性搜索中的术语排名得分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆