ElasticSearch得分的分母是多少? [英] What's the denominator for ElasticSearch scores?
问题描述
我有一个具有多重标准的搜索。
I have a search which has multiple criterion.
每个标准(按分组应该
)具有不同的加权分数。
Each criterion (grouped by should
) has a different weighted score.
ElasticSearch返回结果列表;每个都有一个分数 - 这对我来说似乎是一个任意的得分。这是因为我找不到该分数的分母。
ElasticSearch returns a list of results; each with a score - which seems an arbitrary score to me. This is because I can't find a denominator for that score.
我的问题是 - 我如何将每个分数代表比例?
My question is - how can I represent each score as a ratio?
将每个分数除以 max_score
将无法正常工作,因为它将显示与100%匹配的最佳匹配搜索条件。
Dividing each score by max_score
would not work since it'll show the best match as a 100% match with the search criteria.
推荐答案
_score
计算取决于使用查询例如,一个简单的查询,如:
The _score
calculation depends on the combination of queries used. For instance, a simple query like:
{ "match": { "title": "search" }}
would use Lucene's TFIDFSimilarity, combining:
-
术语频率(TF):术语
搜索
多少次出现在本文档的标题
字段中?越多的分数越高
term frequency (TF): how many times does the term
search
appear in thetitle
field of this document? The more often, the higher the score
逆文档频率(IDF):术语搜索的次数是多少次
出现在索引中所有文档的标题
字段中?
inverse document frequency (IDF): how many times does the term search
appear in the title
field of all documents in the index? The more often, the lower the score
字段规范:标题多长时间
字段?场地越长,得分越低。 (较短的字段,如 title
被认为比较长的字段比 body
更重要。)
field norm: how long is the title
field? The longer the field, the lower the score. (Shorter fields like title
are considered to be more important than longer fields like body
.)
查询规范化因子。 (可以忽略)
A query normalization factor. (can be ignored)
另一方面, bool
这样查询:
"bool": {
"should": [
{ "match": { "title": "foo" }},
{ "match": { "title": "bar" }},
{ "match": { "title": "baz" }}
]
}
将计算 _score
对于匹配的每个子句,将它们加在一起,然后除以子句的总数(并再次使用查询规范化因子)。
would calculate the _score
for each clause which matches, add them together then divide by the total number of clauses (and once again have the query normalization factor applied).
所以这完全取决于你正在使用什么查询。
So it depends entirely on what queries you are using.
您可以详细了解 _score
是通过将 explain
参数添加到您的查询中计算的:
You can get a detailed explanation of how the _score
was calculated by adding the explain
parameter to your query:
curl localhost:9200/_search?explain -d '
{
"query": ....
}'
我的问题是 - 我如何将每个分数代表比例? / p>
My question is - how can I represent each score as a ratio?
没有理解你想要的查询是不可能回答的。根据您的用例,您可以使用 function_score
查询来实现自己的评分算法。
Without understanding what you want your query to do it is impossible to answer this. Depending on your use case, you could use the function_score
query to implement your own scoring algorithm.
这篇关于ElasticSearch得分的分母是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!