完全匹配,不区分大小写的匹配,无需在Elasticsearch 6.2中进行标准化 [英] Exact-match, case-insensitive match without normalization in Elasticsearch 6.2
问题描述
我看过每一篇文章,发现我可以找到执行精确匹配,不区分大小写的查询的信息,但是在实现时,它们并不能满足我的需求.
I have looked at every article and post I could find about performing exact-match, case-insensitive queries, but upon implementation, they do not perform what I am looking for.
在将此问题标记为重复之前,请阅读整篇文章.
给出用户名,我想查询我的Elasticsearch数据库以仅返回与用户名完全匹配但不区分大小写的文档.
Given a username, I want to query my Elasticsearch database to only return a document that exactly matches the username, but is also case insensitive.
我尝试为我的username
属性指定一个lowercase
分析器,并使用match
查询来实现此行为.虽然这解决了区分大小写的匹配问题,但在完全匹配时失败.
I have tried specifying a lowercase
analyzer for my username
property and use a match
query to implement this behavior. While this solves the problem of case insensitive matching, it fails at exact matching.
我考虑使用lowercase
规范化器,但这会使索引中的所有用户名都变为小写,因此当我聚合用户名时,它们将以小写形式返回,这不是我想要的.我需要在用户名中保留每个字母的原始大小写.
I looked into using a lowercase
normalizer, but that would make all of my usernames lowercase before indexing, so when I aggregate the usernames, they would return in lowercase form, which is not what I want. I need to preserve the original case of each letter in the username.
POST {elastic}/users/_doc
{
"email": "random@email.com",
"username": "UsErNaMe",
"password": "1234567"
}
此文档将完全按照其原样存储在名为users
的索引中.
This document will be stored in an index called users
exactly the way it is.
GET {frontend}/user/UsErNaMe
应该返回
{
"email": "random@email.com",
"username": "UsErNaMe",
"password": "1234567"
}
和
GET {frontend}/user/username
应该返回
{
"email": "random@email.com",
"username": "UsErNaMe",
"password": "1234567"
}
和
GET {frontend}/user/USERNAME
应该返回
{
"email": "random@email.com",
"username": "UsErNaMe",
"password": "1234567"
}
和
GET {frontend}/user/UsErNaMe $RaNdoM LeTteRs
应该不返回任何内容.
谢谢.
推荐答案
要实现不区分大小写的精确匹配,您需要定义自己的分析器.分析仪需要执行两个操作:
To achieve case insensitive exact match you need to define you own analyzer. The analyzer need to perform two actions:
- 小写输入值. (不区分大小写) 小写操作后对输入的任何修改为
- 否. (用于精确搜索)
- lowercase the input value. (for case insensitive)
- no to any modification to the input after lowercase action. (for exact search)
以上两个可以通过以下方式实现:
The above two can be achieve by:
-
定义自定义分析器时,
- 使用
lowercase
过滤器. - 将
tokenizer
设置为keyword
,这将确保在应用小写过滤器后生成输入值的单个标记.
- use
lowercase
filter when defining custom analyzer. - set the
tokenizer
tokeyword
, this will make sure to generate single token of the input value after lowercase filter is applied.
现在,可以将此自定义分析器应用于需要区分大小写的精确搜索的文本字段.
Now this custom analyzer can be applied to a text field where case insensitive exact search is required.
因此,您可以在下面使用索引来创建索引:
So to create index you can use below:
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"case_insensitive_analyzer": {
"type": "custom",
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"username": {
"type": "text",
"analyzer": "case_insensitive_analyzer"
},
"password": {
"type": "keyword"
}
}
}
}
}
上面的case_insensitive_analyzer
是必需的分析器,如您所见,它已应用于username
字段.
In the above case_insensitive_analyzer
is the required analyzer and as you can see it is applied on username
field.
因此,当您为文档编制索引时,如下所示:
So when you index a document as below:
PUT test/_doc/1
{
"email": "random@email.com",
"username": "UsErNaMe",
"password": "1234567"
}
对于字段username
,输入为UsErNaMe
.分析仪首先将lowercase
过滤器应用于输入UsErNaMe
,得出值username
.现在,在这个值username
上,它应用keyword
标记化器,该标记器什么也不做,只是将应用过滤器后获得的值作为单个标记输出,即username
.
for the field username
the input is UsErNaMe
. The analyzer first applies lowercase
filter on the input UsErNaMe
resulting into the value username
. Now on this value username
it applies keyword
tokenizer which does nothing but output the value obtained after applying filter(s), as a single token i.e. username
.
现在,您可以使用以下匹配查询来搜索用户名字段:
Now you can use match query as below to search against user name field:
GET test/_doc/_search
{
"query": {
"match": {
"username": "USERNAME"
}
}
}
以上使用将为您提供所需的输出.将上述查询中的USERNAME
替换为username
或UsErNaMe
或USERname
,所有文件都将匹配.原因是,在搜索中是否未明确指定分析器时,elasticsearch会在建立索引时使用应用于该字段的分析器.在上述情况下,当对字段username
进行搜索时,会将case_insensitive_analyzer
应用于输入值,即USERNAME
,这将导致标记username
并因此导致匹配.
Using above will give you desired output. Replace USERNAME
in above query to username
or UsErNaMe
or USERname
all will match the document. The reason for this is that while searching if no analyser is explicitly specified, elasticsearch uses the analyzer applied to the field while indexing. In the above case when searching against field username
, case_insensitive_analyzer
will be applied to input value i.e. USERNAME
which will result in token username
and hence the match.
这篇关于完全匹配,不区分大小写的匹配,无需在Elasticsearch 6.2中进行标准化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!