用于Elasticsearch中精确,语音和模糊匹配的自定义分数 [英] Custom score for exact, phonetic and fuzzy matching in elasticsearch

查看:145
本文介绍了用于Elasticsearch中精确,语音和模糊匹配的自定义分数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需要对姓名进行自定义评分的要求.为简单起见,可以说,如果我针对索引中的名称搜索"Smith",则逻辑应为:

I have a requirement where there needs to be custom scoring on name. To keep it simple lets say, if I search for 'Smith' against names in the index, the logic should be:

if input = exact 'Smith' then score = 100%
else
 if input = phonetic match then
   score = <depending upon fuzziness match of input with name>% 
 end if
end if;

我可以搜索模糊度为1的文档,但是我不知道如何根据模糊程度为其自定义评分.谢谢!

I'm able to search documents with a fuzziness of 1 but I don't know how to give it custom score depending upon how fuzzy it is. Thanks!

更新:我经历了与我的要求相同的帖子,有人提到该人使用本机脚本解决了该问题.我的问题仍然存在,如何根据相似距离实际获得分数,以便可以在本机脚本中使用它:

Update: I went through a post that had the same requirement as mine and it was mentioned that the person solved it by using native scripts. My question still remains, how to actually get the score based on the similarity distance such that it can be used in the native scripts:

该帖子供参考: https://discuss.elastic.co/t/fuzzy-query-scoring-based-levenshtein-distance/11116

要在帖子中查找的文字:对于未来的读者,我通过创建自定义分数查询来解决此问题,编写一个(本机)脚本来处理得分."

The text to look for in the post: "For future readers I solved this issue by creating a custom score query and writing a (native) script to handle the scoring."

推荐答案

您可以使用rescore函数查询(

You can implement this search logic using the rescore function query (docs here).

这里有一个可能的例子:

Here there is a possible example:

    {
    "query": {
        "function_score": {
          "query": { "match": {
            "input": "Smith"
          } },
          "boost": "5", 
          "functions": [
              {
                  "filter": { "match": { "input.keyword": "Smith" } },
                  "random_score": {}, 
                  "weight": 23
              }
          ]
        }
      }
   }

在此示例中,我们有一个映射,其中输入字段同时被索引为文本和关键字(input.keyword用于完全匹配).我们对与史密斯"一词完全匹配的文档重新评分,该单词相对于第一个查询匹配的所有文档得分更高(在该示例中为匹配项,但在您的情况下将是带有模糊性的查询).

In this example we have a mapping with the input field indexed both as text and keyword (input.keyword is for exact match). We re-score the documents that match exactly the term "Smith" with an higher score respect to the all documents matched by the first query (in the example is a match, but in your case will be the query with fuzziness).

您可以控制重新得分效果以调整权重参数.

You can control the re-score effect tuning the weight parameter.

这篇关于用于Elasticsearch中精确,语音和模糊匹配的自定义分数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆