弹性搜索查询字符串不要按字部分搜索 [英] elasticsearch query string dont search by word part

查看:95
本文介绍了弹性搜索查询字符串不要按字部分搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发送此请求

  curl -XGET'host / process_test_3 / 14 / _search'-d'{
query:{
query_string:{
query:\* cor interface * \,
fields:[title ,obj_id]
}
}
}'

我得到正确的结果

  {
taken:12,
timed_out
_shards:{
total:5,
successful:5,
failed:0
},
hits:{
total:3,
max_score:5.421598,
hits:[
{
_index:process_test_3 ,
_type:14,
_id:141_dashboard_14,
_score:5.421598,
_source:{
obj_type :dashboard,
obj_id:141,
title:Cor Interface Monitoring
}
}
]
}
}

但是当我想按字部分搜索时,例如

  curl -XGET'host / process_test_3 / $ / $$$$$$$$$$$$ ,
fields:[title,obj_id]
}
}
}'

我没有得到任何结果:

  {
take:4,
timed_out:false,
_shards:{
total:5,
success:5,
失败:0
},
hits:{
total:0,
max_score:null,
hits
}
}

我做错了什么?

解决方案

这是因为您的标题字段可能已被标准分析器(默认设置)和标题 Cor Interface Monitoring 已被标记为三个令牌 cor interface 监视



为了搜索任何字符串的子字符串,您需要创建一个自定义分析器利用 ngram令牌过滤器为了也索引你的每个令牌的所有子字符串。



你可以这样创建你的索引:

  curl -XPUT localhost:9200 / process_test_3 -d'{
settings:{
analysis:{
analyzer:{
子串_analyzer:{
tokenizer:standard,
filter:[smallcase,substring]
}
},
:{
substring:{
type:nGram,
min_gram:2,
max_gram:15
}



mappings:{
14:{
properties:{
title
type:string,
analyzer:substring_analyzer
}
}
}
}
}'

然后,您可以重新索引您的数据。这样做是标题 Cor Interface Monitoring 现在将被标记为:




  • co <​​/ code>, cor

  • in int inte inter interf 等等

  • mo mon moni



    • ,以便您的第二个搜索查询现在将返回您期望的文档,因为令牌 cor inter 现在匹配。


      I'm sending this request

      curl -XGET 'host/process_test_3/14/_search' -d '{
        "query" : {
          "query_string" : {
            "query" : "\"*cor interface*\"",
            "fields" : ["title", "obj_id"]
          }
        }
      }'
      

      And I'm getting correct result

      {
        "took": 12,
        "timed_out": false,
        "_shards": {
          "total": 5,
          "successful": 5,
          "failed": 0
        },
        "hits": {
          "total": 3,
          "max_score": 5.421598,
          "hits": [
            {
              "_index": "process_test_3",
              "_type": "14",
              "_id": "141_dashboard_14",
              "_score": 5.421598,
              "_source": {
                "obj_type": "dashboard",
                "obj_id": "141",
                "title": "Cor Interface Monitoring"
              }
            }
          ]
        }
      }
      

      But when I want to search by word part, as example

      curl -XGET 'host/process_test_3/14/_search' -d '
      {
        "query" : {
          "query_string" : {
            "query" : "\"*cor inter*\"",
            "fields" : ["title", "obj_id"]
          }
        }
      }'
      

      I'm getting no results back:

      {
        "took" : 4,
        "timed_out" : false,
        "_shards" : {
          "total" : 5,
          "successful" : 5,
          "failed" : 0
        },
        "hits" : {
          "total" : 0,
          "max_score" : null,
          "hits" : []
        }
      }
      

      What am I doing wrong?

      解决方案

      This is because your title field has probably been analyzed by the standard analyzer (default setting) and the title Cor Interface Monitoring has been tokenized as the three tokens cor, interface and monitoring.

      In order to search any substring of words, you need to create a custom analyzer which leverages the ngram token filter in order to also index all substrings of each of your tokens.

      You can create your index like this:

      curl -XPUT localhost:9200/process_test_3 -d '{
        "settings": {
          "analysis": {
            "analyzer": {
              "substring_analyzer": {
                "tokenizer": "standard",
                "filter": ["lowercase", "substring"]
              }
            },
            "filter": {
              "substring": {
                "type": "nGram",
                "min_gram": 2,
                "max_gram": 15
              }
            }
          }
        },
        "mappings": {
          "14": {
            "properties": {
              "title": {
                "type": "string",
                "analyzer": "substring_analyzer"
              }
            }
          }
        }
      }'
      

      Then you can reindex your data. What this will do is that the title Cor Interface Monitoring will now be tokenized as:

      • co, cor, or
      • in, int, inte, inter, interf, etc
      • mo, mon, moni, etc

      so that your second search query will now return the document you expect because the tokens cor and inter will now match.

      这篇关于弹性搜索查询字符串不要按字部分搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆