弹性搜索中没有正确搜索电子邮件 [英] Emails not being searched properly in elasticsearch

查看:193
本文介绍了弹性搜索中没有正确搜索电子邮件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在弹性搜索中编制了一些邮件ID作为字段的几个文档。但是当我查询一个特定的电子邮件ID时,搜索结果显示所有文档,而不进行过滤。



这是我使用的查询

  {
查询:{
match:{
mail-id:abc @ gmail .com
}
}
}


解决方案

默认情况下,您的 mail-id 字段由标准分析器进行分析,标准分析器将标记电子邮件 abc@gmail.com 进入以下两个令牌:

  {
tokens:[{
token:abc,
start_offset:0,
end_offset:3,
type:< ALPHANUM>,
位置:1
},{
token:gmail.com,
start_offset:4,
end_offset:13,
类型:< ALPHANUM>,
position:2
}]
}

您需要的是使用 UAX电子邮件URL标记器,这将标记邮件地址作为一个标记。



所以你需要如下定义你的索引:

 code> curl -XPUT localhost:9200 / people -d'{
settings:{
analysis:{
analyzer:{
my_analyzer :{
type:custom,
tokenizer:uax_url_email
}
}
}
},
mappings:{
person:{
properties:{
mail-id:{
type:string,
analyzer:my_analyzer
}
}
}
}
}'

创建该索引后,您可以看到电子邮件 abc@gmail.com 将被标记为单个令牌,您的搜索将按预期工作。

  curl -XGET'localhost:9200 / people / _analyze?analyzer = my_analyzer& pretty'-d'abc@gmail.com'
{
tokens:[{
token :abc@gmail.com,
start_offset:0,
end_offset:13,
type:< EMAIL>,
:1
}]
}


I have indexed a few documents in elasticsearch which have email ids as a field. But when I query for a specific email id, the search results are showing all the documents without filtering.

This is the query I have used

{
 "query": {
   "match": {
     "mail-id": "abc@gmail.com"
   }
 }
}

解决方案

By default, your mail-id field is analyzed by the standard analyzer which will tokenize the email abc@gmail.com into the following two tokens:

{
  "tokens" : [ {
    "token" : "abc",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "gmail.com",
    "start_offset" : 4,
    "end_offset" : 13,
    "type" : "<ALPHANUM>",
    "position" : 2
  } ]
}

What you need instead is to create a custom analyzer using the UAX email URL tokenizer, which will tokenize email addresses as a one token.

So you need to define your index as follows:

curl -XPUT localhost:9200/people -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "uax_url_email"
        }
      }
    }
  },
  "mappings": {
    "person": {
      "properties": {
        "mail-id": {
          "type": "string",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}'

After creating that index, you can see that the email abc@gmail.com will be tokenized as a single token and your search will work as expected.

 curl -XGET 'localhost:9200/people/_analyze?analyzer=my_analyzer&pretty' -d 'abc@gmail.com'
{
  "tokens" : [ {
    "token" : "abc@gmail.com",
    "start_offset" : 0,
    "end_offset" : 13,
    "type" : "<EMAIL>",
    "position" : 1
  } ]
}

这篇关于弹性搜索中没有正确搜索电子邮件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆