ElasticSearch默认评分机制 [英] ElasticSearch default scoring mechanism
问题描述
例如,我想通过搜索文档,例如名称字段。我使用.NET NEST客户端来编写我的查询。让我们考虑一下这种类型的查询:
IQueryResponse< SomeEntity> queryResult = client.Search< SomeEntity>(s =>
s.From(0)
.Size(300)
.Explain()
.Query(q => ; q.Match(a => a.OnField(q.Resolve(f => f.Name))。QueryString(ExampleName)))
);
这被翻译成这样的JSON查询:
{
from:0,
size:300,
explain:true,
query {
match:{
Name:{
query:ExampleName
}
}
}
}
有大约110万个文档执行搜索。我得到的是(只是结果的一部分,我自己的格式):
650ExampleName 7,313398
651ExampleName7,313398
652ExampleName7,313398
653ExampleName7,239194
654ExampleName7,239194
860东西的例子4,5708737
其中第一个字段只是一个Id,其次是ElasticSearch在其中进行搜索的Name字段,第三个是score。
如您所见,ES索引中有许多重复。由于一些发现的文档具有不同的分数,尽管它们完全一样(只有不同的Id),我的结论是,不同的分片执行搜索整个数据集的不同部分,这导致我跟踪得分有点基于整体给定的分片中的数据,不仅仅是搜索引擎实际考虑的文档。
问题是,这个评分怎么样?我的意思是,你能告诉我/告诉我/指出我确切的公式来计算ES发现的每个文件的分数?最后,如何改变这个评分机制?
提前感谢。
默认评分是核心Lucene中的DefaultSimilarity算法,主要记录在这里。您可以通过配置您自己的相似性
,或使用类似 custom_score
查询。
显示的前五个结果中的奇数分数变化似乎足够小,以至于不关心我很多查询结果的有效性及其排序,但如果您想了解其原因, explain
api 可以准确地向您显示那里的事情。
I'm relatively new to ElasticSearch. What I am looking for, is plain, clear explaination, of how default scoring mechanism of ElasticSearch (Lucene) really works. I mean, does it use Lucene scoring, or maybe it uses scoring of it's own?
For example, I want to search for document by, for example, "Name" field. I use .NET NEST client to write my queries. Let's consider this type of query:
IQueryResponse<SomeEntity> queryResult = client.Search<SomeEntity>(s =>
s.From(0)
.Size(300)
.Explain()
.Query(q => q.Match(a => a.OnField(q.Resolve(f => f.Name)).QueryString("ExampleName")))
);
which is translated to such JSON query:
{
"from": 0,
"size": 300,
"explain": true,
"query": {
"match": {
"Name": {
"query": "ExampleName"
}
}
}
}
There is about 1.1 million documents that search is performed on. What I get in return, is (that is only part of the result, formatted on my own):
650 "ExampleName" 7,313398
651 "ExampleName" 7,313398
652 "ExampleName" 7,313398
653 "ExampleName" 7,239194
654 "ExampleName" 7,239194
860 "ExampleName of Something" 4,5708737
where first field is just an Id, second is Name field on which ElasticSearch performed it's searching, and third is score.
As you can see, there are many duplicates in ES index. As some of found documents have diffrent score, despite that they are exactly the same (with only diffrent Id), I concluded that diffrent shards performed searching on diffrent parts of whole dataset, which leads me to trail that the score is somewhat based on overall data in given shard, not exclusively on document that is actually considered by search engine.
The question is, how exactly does this scoring work? I mean, could you tell me/show me/point me to exact formula to calculate score for each document found by ES? And eventually, how this scoring mechanism can be changed?
Thanks in advance.
The default scoring is the DefaultSimilarity algorithm in core Lucene, largely documented here. You can customize scoring by configuring your own Similarity
, or using something like a custom_score
query.
The odd score variation in the first five results shown seems small enough that it doesn't concern me much, as far as the validity of the query results and their ordering, but if you want to understand the cause of it, the explain
api can show you exactly what is going on there.
这篇关于ElasticSearch默认评分机制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!