ElasticSearch得分的分母是多少? [英] What's the denominator for ElasticSearch scores?

查看:188
本文介绍了ElasticSearch得分的分母是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有多重标准的搜索。

I have a search which has multiple criterion.

每个标准(按分组应该)具有不同的加权分数。

Each criterion (grouped by should) has a different weighted score.

ElasticSearch返回结果列表;每个都有一个分数 - 这对我来说似乎是一个任意的得分。这是因为我找不到该分数的分母。

ElasticSearch returns a list of results; each with a score - which seems an arbitrary score to me. This is because I can't find a denominator for that score.

我的问题是 - 我如何将每个分数代表比例?

My question is - how can I represent each score as a ratio?

将每个分数除以 max_score 将无法正常工作,因为它将显示与100%匹配的最佳匹配搜索条件。

Dividing each score by max_score would not work since it'll show the best match as a 100% match with the search criteria.

推荐答案

_score 计算取决于使用查询例如,一个简单的查询,如:

The _score calculation depends on the combination of queries used. For instance, a simple query like:

{ "match": { "title": "search" }}

将使用Lucene的

would use Lucene's TFIDFSimilarity, combining:


  • 术语频率(TF):术语搜索多少次出现在本文档的标题字段中?越多的分数越高

  • term frequency (TF): how many times does the term search appear in the title field of this document? The more often, the higher the score

逆文档频率(IDF):术语搜索的次数是多少次出现在索引中所有文档的标题字段中?

inverse document frequency (IDF): how many times does the term search appear in the title field of all documents in the index? The more often, the lower the score

字段规范:标题多长时间字段?场地越长,得分越低。 (较短的字段,如 title 被认为比较长的字段比 body 更重要。)

field norm: how long is the title field? The longer the field, the lower the score. (Shorter fields like title are considered to be more important than longer fields like body.)

查询规范化因子。 (可以忽略)

A query normalization factor. (can be ignored)

另一方面, bool 这样查询:

"bool": {
    "should": [
        { "match": { "title": "foo" }},
        { "match": { "title": "bar" }},
        { "match": { "title": "baz" }}
    ]
}

将计算 _score 对于匹配的每个子句,将它们加在一起,然后除以子句的总数(并再次使用查询规范化因子)。

would calculate the _score for each clause which matches, add them together then divide by the total number of clauses (and once again have the query normalization factor applied).

所以这完全取决于你正在使用什么查询。

So it depends entirely on what queries you are using.

您可以详细了解 _score 是通过将 explain 参数添加到您的查询中计算的:

You can get a detailed explanation of how the _score was calculated by adding the explain parameter to your query:

curl localhost:9200/_search?explain -d '
{
    "query": ....
}'




我的问题是 - 我如何将每个分数代表比例? / p>

My question is - how can I represent each score as a ratio?

没有理解你想要的查询是不可能回答的。根据您的用例,您可以使用 function_score 查询来实现自己的评分算法。

Without understanding what you want your query to do it is impossible to answer this. Depending on your use case, you could use the function_score query to implement your own scoring algorithm.

这篇关于ElasticSearch得分的分母是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆