elasticsearch 查询字符串不按单词部分搜索 [英] elasticsearch query string dont search by word part

查看:41
本文介绍了elasticsearch 查询字符串不按单词部分搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在发送此请求

curl -XGET 'host/process_test_3/14/_search' -d '{
  "query" : {
    "query_string" : {
      "query" : ""*cor interface*"",
      "fields" : ["title", "obj_id"]
    }
  }
}'

我得到了正确的结果

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 5.421598,
    "hits": [
      {
        "_index": "process_test_3",
        "_type": "14",
        "_id": "141_dashboard_14",
        "_score": 5.421598,
        "_source": {
          "obj_type": "dashboard",
          "obj_id": "141",
          "title": "Cor Interface Monitoring"
        }
      }
    ]
  }
}

但是当我想按单词部分搜索时,例如

But when I want to search by word part, as example

curl -XGET 'host/process_test_3/14/_search' -d '
{
  "query" : {
    "query_string" : {
      "query" : ""*cor inter*"",
      "fields" : ["title", "obj_id"]
    }
  }
}'

我没有得到任何结果:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : []
  }
}

我做错了什么?

推荐答案

这是因为您的 title 字段可能已经被标准分析器(默认设置)和标题 Cor Interface 分析过监控被标记为三个标记corinterfacemonitoring.

This is because your title field has probably been analyzed by the standard analyzer (default setting) and the title Cor Interface Monitoring has been tokenized as the three tokens cor, interface and monitoring.

为了搜索词的任何子串,您需要创建一个自定义分析器,它利用了 ngram 标记过滤器 以便同时索引每个标记的所有子字符串.

In order to search any substring of words, you need to create a custom analyzer which leverages the ngram token filter in order to also index all substrings of each of your tokens.

您可以像这样创建索引:

You can create your index like this:

curl -XPUT localhost:9200/process_test_3 -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "substring_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "substring"]
        }
      },
      "filter": {
        "substring": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 15
        }
      }
    }
  },
  "mappings": {
    "14": {
      "properties": {
        "title": {
          "type": "string",
          "analyzer": "substring_analyzer"
        }
      }
    }
  }
}'

然后您可以重新索引您的数据.这将做的是标题 Cor Interface Monitoring 现在将被标记为:

Then you can reindex your data. What this will do is that the title Cor Interface Monitoring will now be tokenized as:

  • cocoror
  • inintinteinterinterf
  • momonmon
  • co, cor, or
  • in, int, inte, inter, interf, etc
  • mo, mon, moni, etc

以便您的第二个搜索查询现在将返回您期望的文档,因为标记 corinter 现在将匹配.

so that your second search query will now return the document you expect because the tokens cor and inter will now match.

这篇关于elasticsearch 查询字符串不按单词部分搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆