elasticsearch 查询字符串不按单词部分搜索 [英] elasticsearch query string dont search by word part

查看：41 发布时间：2021/12/13 11:35:08 elasticsearch query-string

本文介绍了elasticsearch 查询字符串不按单词部分搜索的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在发送此请求

curl -XGET 'host/process_test_3/14/_search' -d '{
  "query" : {
    "query_string" : {
      "query" : ""*cor interface*"",
      "fields" : ["title", "obj_id"]
    }
  }
}'

我得到了正确的结果

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 5.421598,
    "hits": [
      {
        "_index": "process_test_3",
        "_type": "14",
        "_id": "141_dashboard_14",
        "_score": 5.421598,
        "_source": {
          "obj_type": "dashboard",
          "obj_id": "141",
          "title": "Cor Interface Monitoring"
        }
      }
    ]
  }
}

但是当我想按单词部分搜索时，例如

But when I want to search by word part, as example

curl -XGET 'host/process_test_3/14/_search' -d '
{
  "query" : {
    "query_string" : {
      "query" : ""*cor inter*"",
      "fields" : ["title", "obj_id"]
    }
  }
}'

我没有得到任何结果:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : []
  }
}

我做错了什么?

推荐答案

这是因为您的 title 字段可能已经被标准分析器(默认设置)和标题 Cor Interface 分析过监控被标记为三个标记cor、interface和monitoring.

This is because your title field has probably been analyzed by the standard analyzer (default setting) and the title Cor Interface Monitoring has been tokenized as the three tokens cor, interface and monitoring.

为了搜索词的任何子串，您需要创建一个自定义分析器，它利用了 ngram 标记过滤器以便同时索引每个标记的所有子字符串.

In order to search any substring of words, you need to create a custom analyzer which leverages the ngram token filter in order to also index all substrings of each of your tokens.

您可以像这样创建索引:

You can create your index like this:

curl -XPUT localhost:9200/process_test_3 -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "substring_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "substring"]
        }
      },
      "filter": {
        "substring": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 15
        }
      }
    }
  },
  "mappings": {
    "14": {
      "properties": {
        "title": {
          "type": "string",
          "analyzer": "substring_analyzer"
        }
      }
    }
  }
}'

然后您可以重新索引您的数据.这将做的是标题 Cor Interface Monitoring 现在将被标记为:

Then you can reindex your data. What this will do is that the title Cor Interface Monitoring will now be tokenized as:

co、cor、or
in、int、inte、inter、interf 等
mo、mon、mon 等

co, cor, or
in, int, inte, inter, interf, etc
mo, mon, moni, etc

以便您的第二个搜索查询现在将返回您期望的文档，因为标记 cor 和 inter 现在将匹配.

so that your second search query will now return the document you expect because the tokens cor and inter will now match.

这篇关于elasticsearch 查询字符串不按单词部分搜索的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

elasticsearch 查询字符串不按单词部分搜索 [英] elasticsearch query string dont search by word part

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

elasticsearch 查询字符串不按单词部分搜索 [英] elasticsearch query string dont search by word part

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭