Elasticsearch匹配查询与带有撇号的文档不匹配 [英] Elasticsearch match query does not match a document with apostrophe

查看:80
本文介绍了Elasticsearch匹配查询与带有撇号的文档不匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为地区自动完成功能(一种较简单的Google Maps版本)构建搜索器.我正在使用的查询似乎一切正常:

I'm building a searcher for a localities autocomplete, a simpler version of Google Maps one. Everything seemed to be working ok with the query I was using:

{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "Ametlla",
          "type": "best_fields",
          "fields": [
            "locality",
            "alternative_names"
          ],
          "operator": "and"
        }
      },
      "filter": {
        "term": {
          "country_code": "ES"
        }
      }
    }
  }
}

我发现的问题与西班牙的一个城市有关:滨海拉梅塔.

The issue I discovered is related to a city from Spain: L'Ametlla de Mar.

/localities_index/localities/10088

{
  "_index": "localities_index",
  "_type": "localities",
  "_id": "10088",
  "_version": 1,
  "_seq_no": 133,
  "_primary_term": 4,
  "found": true,
  "_source": {
    "country_code": "es",
    "locality": "L'Ametlla de Mar",
    "alternative_names": []
  }
}

您可以搜索与之匹配的Ametlla(请参阅下面的部分名称示例查询)

You can search for Ametlla and it's matched (see following partial name example query)

{
    "query": {
        "match": {
            "locality": {
                "query" : "Ametlla"
            }
        }
    }
}

/localities_index/localities/10088/_explain

{
  "_index": "localities_index",
  "_type": "localities",
  "_id": "10088",
  "matched": true,
  "explanation": {
    "value": 3.3985975,
    "description": "weight(locality:ametlla in 2) [PerFieldSimilarity], result of:",
    "details": [
      {
        "value": 3.3985975,
        "description": "score(freq=1.0), product of:",
        "details": [
          {
            "value": 2.2,
            "description": "boost",
            "details": []
          },
          {
            "value": 3.6686769,
            "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
            "details": [
              {
                "value": 2,
                "description": "n, number of documents containing term",
                "details": []
              },
              {
                "value": 97,
                "description": "N, total number of documents with field",
                "details": []
              }
            ]
          },
          {
            "value": 0.4210829,
            "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
            "details": [
              {
                "value": 1.0,
                "description": "freq, occurrences of term within document",
                "details": []
              },
              {
                "value": 1.2,
                "description": "k1, term saturation parameter",
                "details": []
              },
              {
                "value": 0.75,
                "description": "b, length normalization parameter",
                "details": []
              },
              {
                "value": 9.0,
                "description": "dl, length of field",
                "details": []
              },
              {
                "value": 7.5360823,
                "description": "avgdl, average length of field",
                "details": []
              }
            ]
          }
        ]
      }
    ]
  }
}

但如果使用其全名,则不会.

but if you use its full name it is not.

正如我在 https://stackoverflow.com/a/49362505punctuation添加到token_chars中>但是没有用.因此,我尝试将'添加为custom_token_chars,但它也不起作用. /localities_index/_settings

I've tried adding punctuation to token_chars, as I saw at https://stackoverflow.com/a/49362505 but it didn't work. So I tried adding ' as custom_token_chars and it didn't work either. /localities_index/_settings

{
  "localities_index": {
    "settings": {
      "index": {
        "number_of_shards": "1",
        "provided_name": "localities_index",
        "creation_date": "1596537683568",
        "analysis": {
          "analyzer": {
            "autocomplete": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "tokenizer": "autocomplete"
            },
            "autocomplete_search": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "tokenizer": "lowercase"
            }
          },
          "tokenizer": {
            "autocomplete": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "custom_token_chars": "'",
              "min_gram": "1",
              "type": "edge_ngram",
              "max_gram": "15"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "lS3Ork2zSySYJbJYmx29aw",
        "version": {
          "created": "7040099"
        }
      }
    }
  }
}

/localities_index/_mapping

{
  "localities_index": {
    "mappings": {
      "properties": {
        "alternative_names": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "country_code": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "locality": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

推荐答案

您可以使用

You can use the Apostrophe token filter in your custom analyzer and use that on your field(locality which contains them) and use match query which you are already using as it will use the same analyzer which is used at index time and you will get the expected result.

这篇关于Elasticsearch匹配查询与带有撇号的文档不匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆