按弹性搜索中的术语排名得分？ [英] Scoring by term position in ElasticSearch?

查看：139 发布时间：2017/8/7 1:22:48 elasticsearch

本文介绍了按弹性搜索中的术语排名得分？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在ElasticSearch中实现自动完成索引，并且遇到排序/评分问题。说我在索引中有以下字符串：

 苹果香蕉椰子甜甜圈
苹果香蕉甜甜圈榴莲
苹果甜甜圈椰子榴莲
甜甜圈香蕉椰子榴莲

当我搜索甜甜圈，我希望结果按照这样的位置排序：

 甜甜圈香蕉椰子榴莲
苹果甜甜圈椰子榴莲
苹果香蕉甜甜圈榴莲
苹果香蕉椰子甜甜圈

我无法弄清楚如何做到这一点。术语位置不被考虑到默认评分逻辑中，我无法找到一种方法来获取它。似乎就像一个简单的问题，尽管其他人必须遇到这个问题。有没有人想出来？

谢谢！

解决方案

根据Andrei的答案，我最终得出的解决方案是，根据结果中第一个字的长度，扩展到支持多个搜索字词和额外的得分：

首先，定义以下自定义分析器（将整个字符串保留为单个令牌并将其更低）：

 raw_analyzer：{
type：custom，
filter：[
smallcase
]，
tokenizer：keyword
}

其次，定义您的搜索字段映射（我的名称为名称）：

 name：{
type：string，
analyzer：english，
fields：{
raw：{
type：string，
index_analyzer：raw_analyzer，
search_analyzer 
} 
} 
 }，
_nameFirstWordLength：{
type：long
}

第三，填充索引时使用以下逻辑（我的C＃中）填充：

  _nameFirstWordLength = fi.Name.Split（new [] {''}，StringSplitOptions.RemoveEmptyEntries）[0] .Length

最后，按如下所示进行搜索：

  {
query：{
bool ：{
must：{
match_phrase_prefix：{
name：{
query：apple
} 
 } 
}，
should：{
function_score：{
query：{
query_string：{
 ：[
name.raw
]，
查询：苹果*
} 
}，
script_ 
脚本：100 / doc ['_ nameFirstWordLength']。 } 
} 
} 
}

我使用的是match_phrase_prefix以便支持部分匹配，例如ap匹配苹果。该bool必须/应该与第二个query_string查询对name.raw给一个更高的分数结果，其名称以其中一个搜索条件开头（在我的代码中我预先处理搜索字符串，只是第二个查询，到在每个单词之后添加一个*）。最后，在使用_nameFirstWordLength值的一个function_score脚本中包装第二个查询，导致第二个查询得到的结果进一步按照第一个单词的长度进行排序（例如，导致Apple在Applebee之前显示）。 / p>

I'm implementing an auto-complete index in ElasticSearch and have run into an issue with sorting/scoring. Say I have the following strings in an index:

apple banana coconut donut
apple banana donut durian
apple donut coconut durian
donut banana coconut durian

When I search for "donut", I want the results to be ordered by the term location like so:

donut banana coconut durian
apple donut coconut durian
apple banana donut durian
apple banana coconut donut

I can't figure out how to make that happen. Term position isn't factored into the default scoring logic, and I can't find a way to get it in there. Seems like a simple enough issue though that others must have run into this before. Has anyone figured it out yet?

Thanks!

解决方案

Here's the solution I ended up with, based on Andrei's answer and expanded to support multiple search terms and additional scoring based on length of the first word in the result:

First, define the following custom analyzer (it keeps the entire string as a single token and lowercases it):

"raw_analyzer": {
    "type": "custom",
    "filter": [
        "lowercase"
    ],
    "tokenizer": "keyword"
}

Second, define your search field mapping like so (mine's named "name"):

"name": {
    "type": "string",
    "analyzer": "english",
    "fields": {
        "raw": {
            "type": "string",
            "index_analyzer": "raw_analyzer",
            "search_analyzer": "standard"
        }
    }
},
"_nameFirstWordLength": {
    "type": "long"
}

Third, when populating the index use the following logic (mine's in C#) to populate:

_nameFirstWordLength = fi.Name.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries)[0].Length

Finally, do your search as follows:

{
   "query":{
      "bool":{
         "must":{
            "match_phrase_prefix":{
               "name":{
                  "query":"apple"
               }
            }
         },
         "should":{
            "function_score":{
               "query":{
                  "query_string":{
                     "fields":[
                        "name.raw"
                     ],
                     "query":"apple*"
                  }
               },
               "script_score":{
                  "script":"100/doc['_nameFirstWordLength'].value"
               },
               "boost_mode":"replace"
            }
         }
      }
   }
}

I'm using match_phrase_prefix so that partial matches are supported, such as "ap" matching "apple". The bool must/should with that second query_string query against name.raw gives a higher score to results whose name starts with one of the search terms (in my code I'm pre-processing the search string, just for that second query, to add a "*" after every word). Finally, wrapping that second query in a function_score script that uses the value of _nameFirstWordLength causes the results up-scored by the second query to be further sorted by the length of their first word (causing Apple to show before Applebee's, for example).

这篇关于按弹性搜索中的术语排名得分？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

按弹性搜索中的术语排名得分？ [英] Scoring by term position in ElasticSearch?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

按弹性搜索中的术语排名得分？ [英] Scoring by term position in ElasticSearch?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭