ElasticSearch默认评分机制 [英] ElasticSearch default scoring mechanism

查看:709
本文介绍了ElasticSearch默认评分机制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我比较新的ElasticSearch。我正在寻找什么,是清楚,清楚的解释,ElasticSearch(Lucene)的默认评分机制如何真正有效。我的意思是,它是否使用Lucene得分,还是使用自己的得分?



例如,我想通过搜索文档,例如名称字段。我使用.NET NEST客户端来编写我的查询。让我们考虑一下这种类型的查询:

  IQueryResponse< SomeEntity> queryResult = client.Search< SomeEntity>(s => 
s.From(0)
.Size(300)
.Explain()
.Query(q => ; q.Match(a => a.OnField(q.Resolve(f => f.Name))。QueryString(ExampleName)))
);

这被翻译成这样的JSON查询:

  {
from:0,
size:300,
explain:true,
query {
match:{
Name:{
query:ExampleName
}
}
}
}

有大约110万个文档执行搜索。我得到的是(只是结果的一部分,我自己的格式):

  650ExampleName 7,313398 

651ExampleName7,313398

652ExampleName7,313398

653ExampleName7,239194

654ExampleName7,239194

860东西的例子4,5708737

其中第一个字段只是一个Id,其次是ElasticSearch在其中进行搜索的Name字段,第三个是score。



如您所见,ES索引中有许多重复。由于一些发现的文档具有不同的分数,尽管它们完全一样(只有不同​​的Id),我的结论是,不同的分片执行搜索整个数据集的不同部分,这导致我跟踪得分有点基于整体给定的分片中的数据,不仅仅是搜索引擎实际考虑的文档。



问题是,这个评分怎么样?我的意思是,你能告诉我/告诉我/指出我确切的公式来计算ES发现的每个文件的分数?最后,如何改变这个评分机制?



提前感谢。

解决方案

默认评分是核心Lucene中的DefaultSimilarity算法,主要记录在这里。您可以通过配置您自己的相似性 ,或使用类似 custom_score 查询



显示的前五个结果中的奇数分数变化似乎足够小,以至于不关心我很多查询结果的有效性及其排序,但如果您想了解其原因, explain api 可以准确地向您显示那里的事情。


I'm relatively new to ElasticSearch. What I am looking for, is plain, clear explaination, of how default scoring mechanism of ElasticSearch (Lucene) really works. I mean, does it use Lucene scoring, or maybe it uses scoring of it's own?

For example, I want to search for document by, for example, "Name" field. I use .NET NEST client to write my queries. Let's consider this type of query:

IQueryResponse<SomeEntity> queryResult = client.Search<SomeEntity>(s =>
    s.From(0)
   .Size(300)
   .Explain()
   .Query(q => q.Match(a => a.OnField(q.Resolve(f => f.Name)).QueryString("ExampleName")))
);

which is translated to such JSON query:

{
 "from": 0,
 "size": 300,
 "explain": true,
 "query": {
   "match": {
     "Name": {
       "query": "ExampleName"
      }
    }
  }
}

There is about 1.1 million documents that search is performed on. What I get in return, is (that is only part of the result, formatted on my own):

650   "ExampleName" 7,313398

651   "ExampleName" 7,313398

652   "ExampleName" 7,313398

653   "ExampleName" 7,239194

654   "ExampleName" 7,239194

860   "ExampleName of Something" 4,5708737  

where first field is just an Id, second is Name field on which ElasticSearch performed it's searching, and third is score.

As you can see, there are many duplicates in ES index. As some of found documents have diffrent score, despite that they are exactly the same (with only diffrent Id), I concluded that diffrent shards performed searching on diffrent parts of whole dataset, which leads me to trail that the score is somewhat based on overall data in given shard, not exclusively on document that is actually considered by search engine.

The question is, how exactly does this scoring work? I mean, could you tell me/show me/point me to exact formula to calculate score for each document found by ES? And eventually, how this scoring mechanism can be changed?

Thanks in advance.

解决方案

The default scoring is the DefaultSimilarity algorithm in core Lucene, largely documented here. You can customize scoring by configuring your own Similarity, or using something like a custom_score query.

The odd score variation in the first five results shown seems small enough that it doesn't concern me much, as far as the validity of the query results and their ordering, but if you want to understand the cause of it, the explain api can show you exactly what is going on there.

这篇关于ElasticSearch默认评分机制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆