弹性搜寻结果中的分数有误 [英] Wrong score in elastic search result

查看:54
本文介绍了弹性搜寻结果中的分数有误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

未获得弹性搜索查询结果的正确分数.

Not getting the correct score for the elastic search query result.

ES查询-

{
  "from": 0,
  "size": 10,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "(emergency) OR (emergency*) OR (*emergency) OR (*emergency*)",
            "fields": [
              "MDMGlobalData.Name1"
            ]
          }
        }
      ]
    }
  }
}

ES结果-

{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 798,
      "relation": "eq"
    },
    "max_score": 9.169065,
    "hits": [
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551037160",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "PARAGON EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551040507",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551076447",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "COASTAL EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551100746",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551090880",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "PAFFORD EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551106787",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "CAPROCK EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551021568",
        "_score": 9.121077,
        "_source": {
          "MDMGlobalData": {
            "Name1": "WILTON EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551124137",
        "_score": 9.121077,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY ONE"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551125549",
        "_score": 9.121077,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY ONE"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551133066",
        "_score": 9.121077,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD"
          }
        }
      }
    ]
  }
}

理想地,结果中的第一个集合应该是Name1,其值仅是"emergency".或以紧急"一词开头

Ideally, The first set in the result should be the Name1 which has value just "emergency" or start with the word "emergency"

我们如何在几乎前5个结果集中获得相同的分数?作为Name1值是不同的.

And how could we have the same score for almost first 5 result sets? Being the Name1 value is different.

由于评分错误,结果被弄乱了.如何更正结果中的分数?

Due to wrong scoring, the results are messed up. How to correct the score in the result?

推荐答案

否,不必如此.因为ES遵循 Lucene评分功能

No, That need not be the case. Because ES follows Lucene scoring function

得分相同的原因:

  1. 每个文档中只有两个术语-紧急事件和另外一个单词
  2. Emergency 单词按原样匹配.字段长度相同
  3. 出现的次数是1.即术语频率相同.
  4. 所有条款的相关性均相同. idf
  5. Coord 与您的文档仅包含一次 Emergency
  6. 相同
  1. You have only two terms in each document - emergency and one more word
  2. Emergency word matches as it is. Field Length is same
  3. Number of occurrence is one. i.e Term frequencies are same.
  4. Relevancy is same for all the terms. idf
  5. Coord is same as your doc contains only one occurrence of Emergency

但是,如果您有一个带有 Emergency X Y Z 的文档,那么该文档的得分将低于您拥有的其他文档.因为该词的 term频率较高.

But if you have a document with Emergency X Y Z, then score of this will be lower than the other documents which you have. Because term frequency is higher for this one.

如果您只有 Emergency ,那么此文档的得分将高于所有得分.

And if you have only Emergency, score of this document will be higher than all.

在您的方案中获得相同的分数是完全正常的,因为用户不知道他/她是指哪个紧急情况.

It is perfectly normal to have same score in your scenario as user doesn't know which emergency he/she meant.

更新:

{
    "query":{
        "bool":{
            "must":{
                "term":{
                "MDMGlobalData.Name1":"emergency"
                }
            }
        }
    }
}

使用示例数据,输出:

"hits": [
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "iN1hKnMBojxRtp6HNI7d",
        "_score": 0.10938574,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY"
          }
        }
      },
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "g91TKnMBojxRtp6Hto4q",
        "_score": 0.08701137,
        "_source": {
          "MDMGlobalData": {
            "Name1": "PARAGON EMERGENCY"
          }
        }
      },
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "hN1TKnMBojxRtp6H2I6A",
        "_score": 0.08701137,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD"
          }
        }
      },
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "hd1TKnMBojxRtp6H_I6_",
        "_score": 0.08701137,
        "_source": {
          "MDMGlobalData": {
            "Name1": "COASTAL EMERGENCY"
          }
        }
      },
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "h91VKnMBojxRtp6HYI4e",
        "_score": 0.07223585,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD X"
          }
        }
      }
    ]

这篇关于弹性搜寻结果中的分数有误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆