在 elasticsearch 中未正确搜索电子邮件 [英] Emails not being searched properly in elasticsearch
问题描述
我在 elasticsearch 中索引了一些以电子邮件 ID 作为字段的文档.但是当我查询特定的电子邮件 ID 时,搜索结果会显示所有文档而没有过滤.
I have indexed a few documents in elasticsearch which have email ids as a field. But when I query for a specific email id, the search results are showing all the documents without filtering.
这是我用过的查询
{
"query": {
"match": {
"mail-id": "abc@gmail.com"
}
}
}
推荐答案
默认情况下,您的 mail-id
字段由标准分析器分析,该分析器将对电子邮件 abc@gmail 进行标记.com
变成以下两个标记:
By default, your mail-id
field is analyzed by the standard analyzer which will tokenize the email abc@gmail.com
into the following two tokens:
{
"tokens" : [ {
"token" : "abc",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "gmail.com",
"start_offset" : 4,
"end_offset" : 13,
"type" : "<ALPHANUM>",
"position" : 2
} ]
}
您需要的是使用 UAX 电子邮件 URL 标记器,它将电子邮件地址标记为一个标记.
What you need instead is to create a custom analyzer using the UAX email URL tokenizer, which will tokenize email addresses as a one token.
所以你需要定义你的索引如下:
So you need to define your index as follows:
curl -XPUT localhost:9200/people -d '{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "uax_url_email"
}
}
}
},
"mappings": {
"person": {
"properties": {
"mail-id": {
"type": "string",
"analyzer": "my_analyzer"
}
}
}
}
}'
创建该索引后,您可以看到电子邮件 abc@gmail.com
将被标记为单个标记,您的搜索将按预期进行.
After creating that index, you can see that the email abc@gmail.com
will be tokenized as a single token and your search will work as expected.
curl -XGET 'localhost:9200/people/_analyze?analyzer=my_analyzer&pretty' -d 'abc@gmail.com'
{
"tokens" : [ {
"token" : "abc@gmail.com",
"start_offset" : 0,
"end_offset" : 13,
"type" : "<EMAIL>",
"position" : 1
} ]
}
这篇关于在 elasticsearch 中未正确搜索电子邮件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!