在elasticsearch的较早位置为包含搜索查询的匹配项分配较高的分数 [英] Assign a higher score to matches containing the search query at an earlier position in elasticsearch

查看：139 发布时间：2020/7/5 21:14:00 elasticsearch n-gram relevance booleanquery

本文介绍了在elasticsearch的较早位置为包含搜索查询的匹配项分配较高的分数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这个问题类似于我的其他问题

This question is similar to my other question enter link description here which Val answered.

我有一个包含3个文档的索引.

I have an index containing 3 documents.

    {
            "firstname": "Anne",
            "lastname": "Borg",
        }

    {
            "firstname": "Leanne",
            "lastname": "Ray"
        },

    {
            "firstname": "Anne",
            "middlename": "M",
            "lastname": "Stone"
        }

当我搜索"Ann"时，我希望Elastic返回所有这3个文档(因为它们在一定程度上都与"Ann"相匹配).但是，我希望Leanne Ray的得分(相关性排名)较低，因为搜索词安"在该文档中的出现位置要比其他两个文档中出现的要晚.

When I search for "Ann", I would like elastic to return all 3 of these documents (because they all match the term "Ann" to a degree). BUT, I would like Leanne Ray to have a lower score (relevance ranking) because the search term "Ann" appears at a later position in this document than the term appears in the other two documents.

这是我的索引设置...

Here are my index settings...

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "filter": [
                        "lowercase"
                    ],
                    "type": "custom",
                    "tokenizer": "my_tokenizer"
                }
            },
            "tokenizer": {
                "my_tokenizer": {
                    "token_chars": [
                        "letter",
                        "digit",
                        "custom"
                    ],
                    "custom_token_chars": "'-",
                    "min_gram": "1",
                    "type": "ngram",
                    "max_gram": "2"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "firstname": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword"
                    }
                },
                "copy_to": [
                    "full_name"
                ]
            },
            "lastname": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword"
                    }
                },
                "copy_to": [
                    "full_name"
                ]
            },
            "middlename": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                },
                "copy_to": [
                    "full_name"
                ]
            },
            "full_name": {
                "type": "text",
                "analyzer": "my_analyzer",
                "fields": {
                    "keyword": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

以下查询带回了预期的文档，但归因于Leanne Ray比归因于Anne Borg.

The following query brings back the expected documents, but attributes a higher score to Leanne Ray than to Anne Borg.

{
    "query": {
        "bool": {
            "must": {
                "query_string": {
                    "query": "Ann",
                    "fields": ["full_name"]
                }
            },
            "should": {
                "match": {
                    "full_name": "Ann"}
            }
        }
    }
}

这是结果...

"hits": [
        {
            "_index": "contacts_4",
            "_type": "_doc",
            "_id": "2",
            "_score": 6.6333585,
            "_source": {
                "firstname": "Anne",
                "middlename": "M",
                "lastname": "Stone"
            }
        },
        {
            "_index": "contacts_4",
            "_type": "_doc",
            "_id": "1",
            "_score": 6.142234,
            "_source": {
                "firstname": "Leanne",
                "lastname": "Ray"
            }
        },
        {
            "_index": "contacts_4",
            "_type": "_doc",
            "_id": "3",
            "_score": 6.079495,
            "_source": {
                "firstname": "Anne",
                "lastname": "Borg"
            }
        }

一起使用ngram令牌过滤器和ngram令牌生成器似乎可以解决此问题...

Using an ngram token filter and an ngram tokenizer together seems to fix this problem...

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "filter": [
                        "ngram"
                    ],
                    "tokenizer": "ngram"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "firstname": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword"
                    }
                },
                "copy_to": [
                    "full_name"
                ]
            },
            "lastname": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword"
                    }
                },
                "copy_to": [
                    "full_name"
                ]
            },
            "middlename": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword"
                    }
                },
                "copy_to": [
                    "full_name"
                ]
            },
            "full_name": {
                "type": "text",
                "analyzer": "my_analyzer",
                "search_analyzer": "my_analyzer"
            }
        }
    }
}

同一查询以预期的相对得分带回预期的结果. 为什么这样做?请注意，上面我使用的是带有小写过滤器的ngram标记器，唯一的区别是我使用的是ngram过滤器而不是小写的过滤器.

The same query brings back the expected results with the desired relative scoring. Why does this work? Note that above, I am using an ngram tokenizer with a lowercase filter and the only difference here is that I am using an ngram filter instead of the lowercase filter.

这是结果.请注意，Leanne Ray的得分要低于Anne Borg和Anne M Stone.

Here are the results. Notice that Leanne Ray scored lower than both Anne Borg and Anne M Stone, as desired.

"hits": [
    {
        "_index": "contacts_4",
        "_type": "_doc",
        "_id": "3",
        "_score": 4.953257,
        "_source": {
            "firstname": "Anne",
            "lastname": "Borg"
        }
    },
    {
        "_index": "contacts_4",
        "_type": "_doc",
        "_id": "2",
        "_score": 4.87168,
        "_source": {
            "firstname": "Anne",
            "middlename": "M",
            "lastname": "Stone"
        }
    },
    {
        "_index": "contacts_4",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0364896,
        "_source": {
            "firstname": "Leanne",
            "lastname": "Ray"
        }
    }

顺便说一句，当索引也包含其他文档时，此查询还会带回大量误报结果.并不是这样的问题，因为相对于理想命中的得分而言，误报得分很低.但是仍然不理想.例如，如果我在文档中添加{firstname:Gideon，lastname:Grossma}，则上面的查询也会在结果集中返回该文档-尽管得分比包含字符串"Ann"的文档要低得多/p>

By the way, this query also brings back a whole lot of false positive results when the index contains other documents as well. It's not such a problem becasuethese false positives have very low scores relative to the scores of the desirable hits. But still not ideal. For example, if I add {firstname: Gideon, lastname: Grossma} to the document, the above query will bring back that document in the result set as well - albeit with a much lower score than the documents containing the string "Ann"

在elasticsearch的较早位置为包含搜索查询的匹配项分配较高的分数 [英] Assign a higher score to matches containing the search query at an earlier position in elasticsearch

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在elasticsearch的较早位置为包含搜索查询的匹配项分配较高的分数 [英] Assign a higher score to matches containing the search query at an earlier position in elasticsearch

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭