完全匹配,不区分大小写的匹配,无需在Elasticsearch 6.2中进行标准化 [英] Exact-match, case-insensitive match without normalization in Elasticsearch 6.2

查看:523
本文介绍了完全匹配,不区分大小写的匹配,无需在Elasticsearch 6.2中进行标准化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看过每一篇文章,发现我可以找到执行精确匹配,不区分大小写的查询的信息,但是在实现时,它们并不能满足我的需求.

I have looked at every article and post I could find about performing exact-match, case-insensitive queries, but upon implementation, they do not perform what I am looking for.

在将此问题标记为重复之前,请阅读整篇文章.

给出用户名,我想查询我的Elasticsearch数据库以仅返回与用户名完全匹配但不区分大小写的文档.

Given a username, I want to query my Elasticsearch database to only return a document that exactly matches the username, but is also case insensitive.

我尝试为我的username属性指定一个lowercase分析器,并使用match查询来实现此行为.虽然这解决了区分大小写的匹配问题,但在完全匹配时失败.

I have tried specifying a lowercase analyzer for my username property and use a match query to implement this behavior. While this solves the problem of case insensitive matching, it fails at exact matching.

我考虑使用lowercase规范化器,但这会使索引中的所有用户名都变为小写,因此当我聚合用户名时,它们将以小写形式返回,这不是我想要的.我需要在用户名中保留每个字母的原始大小写.

I looked into using a lowercase normalizer, but that would make all of my usernames lowercase before indexing, so when I aggregate the usernames, they would return in lowercase form, which is not what I want. I need to preserve the original case of each letter in the username.

POST {elastic}/users/_doc

{
    "email": "random@email.com",
    "username": "UsErNaMe",
    "password": "1234567"
}

此文档将完全按照其原样存储在名为users的索引中.

This document will be stored in an index called users exactly the way it is.

GET {frontend}/user/UsErNaMe

应该返回

{
    "email": "random@email.com",
    "username": "UsErNaMe",
    "password": "1234567"
}

GET {frontend}/user/username

应该返回

{
    "email": "random@email.com",
    "username": "UsErNaMe",
    "password": "1234567"
}

GET {frontend}/user/USERNAME

应该返回

{
    "email": "random@email.com",
    "username": "UsErNaMe",
    "password": "1234567"
}

GET {frontend}/user/UsErNaMe $RaNdoM LeTteRs

应该返回任何内容.

谢谢.

推荐答案

要实现不区分大小写的精确匹配,您需要定义自己的分析器.分析仪需要执行两个操作:

To achieve case insensitive exact match you need to define you own analyzer. The analyzer need to perform two actions:

  1. 小写输入值. (不区分大小写)
  2. 小写操作后对输入的任何修改为
  3. 否. (用于精确搜索)
  1. lowercase the input value. (for case insensitive)
  2. no to any modification to the input after lowercase action. (for exact search)

以上两个可以通过以下方式实现:

The above two can be achieve by:

    定义自定义分析器时,
  1. 使用lowercase过滤器.
  2. tokenizer设置为keyword,这将确保在应用小写过滤器后生成输入值的单个标记.
  1. use lowercase filter when defining custom analyzer.
  2. set the tokenizer to keyword, this will make sure to generate single token of the input value after lowercase filter is applied.

现在,可以将此自定义分析器应用于需要区分大小写的精确搜索的文本字段.

Now this custom analyzer can be applied to a text field where case insensitive exact search is required.

因此,您可以在下面使用索引来创建索引:

So to create index you can use below:

PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "case_insensitive_analyzer": {
          "type": "custom",
          "filter": [
            "lowercase"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "email": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        },
        "username": {
          "type": "text",
          "analyzer": "case_insensitive_analyzer"
        },
        "password": {
          "type": "keyword"
        }
      }
    }
  }
}

上面的case_insensitive_analyzer是必需的分析器,如您所见,它已应用于username字段.

In the above case_insensitive_analyzer is the required analyzer and as you can see it is applied on username field.

因此,当您为文档编制索引时,如下所示:

So when you index a document as below:

PUT test/_doc/1
{
  "email": "random@email.com",
  "username": "UsErNaMe",
  "password": "1234567"
}

对于字段username,输入为UsErNaMe.分析仪首先将lowercase过滤器应用于输入UsErNaMe,得出值username.现在,在这个值username上,它应用keyword标记化器,该标记器什么也不做,只是将应用过滤器后获得的值作为单个标记输出,即username.

for the field username the input is UsErNaMe. The analyzer first applies lowercase filter on the input UsErNaMe resulting into the value username. Now on this value username it applies keyword tokenizer which does nothing but output the value obtained after applying filter(s), as a single token i.e. username.

现在,您可以使用以下匹配查询来搜索用户名字段:

Now you can use match query as below to search against user name field:

GET test/_doc/_search
{
  "query": {
    "match": {
      "username": "USERNAME"
    }
  }
}

以上使用将为您提供所需的输出.将上述查询中的USERNAME替换为usernameUsErNaMeUSERname,所有文件都将匹配.原因是,在搜索中是否未明确指定分析器时,elasticsearch会在建立索引时使用应用于该字段的分析器.在上述情况下,当对字段username进行搜索时,会将case_insensitive_analyzer应用于输入值,即USERNAME,这将导致标记username并因此导致匹配.

Using above will give you desired output. Replace USERNAME in above query to username or UsErNaMe or USERname all will match the document. The reason for this is that while searching if no analyser is explicitly specified, elasticsearch uses the analyzer applied to the field while indexing. In the above case when searching against field username, case_insensitive_analyzer will be applied to input value i.e. USERNAME which will result in token username and hence the match.

这篇关于完全匹配,不区分大小写的匹配,无需在Elasticsearch 6.2中进行标准化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆