在弹性搜索中,如何搜索任意的子串? [英] In Elasticsearch, how do I search for an arbitrary substring?

查看:93
本文介绍了在弹性搜索中,如何搜索任意的子串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在弹性搜索中,如何搜索任意的子串,也许包括空格? (搜索部分字词不够,我想搜索整个字段的任何子字符串。)

In Elasticsearch, how do I search for an arbitrary substring, perhaps including spaces? (Searching for part of a word isn't quite enough; I want to search any substring of an entire field.)

我想象它必须在一个关键字字段,而不是文本字段。

I imagine it has to be in a keyword field, rather than a text field.

假设我在我的Elasticsearch索引中只有几千个文档,我尝试:

Suppose I have only a few thousand documents in my Elasticsearch index, and I try:

  "query": {
         "wildcard" : { "description" : "*plan*" }
  }

预期 - 我得到描述中的计划的每个项目,甚至像替代一样。

That works as expected--I get every item where "plan" is in the description, even ones like "supplantation".

现在,我想做

  "query": {
         "wildcard" : { "description" : "*plan is*" }
  }   

...以便我可以匹配文件与Kaplan不是等许多其他可能性。

...so that I might match documents with "Kaplan isn't" among many other possibilities.

看起来这是不可能的通配符,匹配前缀或任何其他查询类型,我可能会看到。如何简单地搜索任何子串? (在SQL中,我只需要执行描述LIKE'%plan is%'

It seems this isn't possible with wildcard, match prefix, or any other query type I might see. How do I simply search on any substring? (In SQL, I would just do description LIKE '%plan is%')

(我知道任何这样的查询对于大型数据集来说是缓慢的,甚至是不可能的。)

(I am aware any such query would be slow or perhaps even impossible for large data sets.)

推荐答案

我希望有一些内置的东西对于这个Elasticsearch,鉴于这个简单的子字符串搜索似乎是一个非常基本的功能(考虑它,它被实现为C中的 strstr() SQL中的LIKE'%%',大多数文本编辑器中的Ctrl + F,C#中的 String.IndexOf 等),但这似乎不就这样请注意,正则表达式查询不支持大小写不敏感,因此我还需要将其与此自定义分析器配对,以使索引全为小写。然后我可以将我的搜索字符串转换为小写。

I was hoping there might be something built-in for this Elasticsearch, given that this simple substring search seems like a very basic capability (Thinking about it, it is implemented as strstr() in C, LIKE '%%' in SQL, Ctrl+F in most text editors, String.IndexOf in C#, etc.), but this seems not to be the case. Note that the regexp query doesn't support case insensitivity, so I also needed to pair it with this custom analyzer, so that the index matches all-lowercase. Then I can convert my search string to lowercase as well.

{
  "settings": {
    "analysis": {
      "analyzer": {
        "lowercase_keyword": { 
          "type": "custom",
          "tokenizer": "keyword", 
          "filter": [ "lowercase" ] 
        }
      }
    }
  },
  "mappings": { 
     ...
     "description": {"type": "text", "analyzer": "lowercase_keyword"},
  }
}

示例查询:

  "query": {
         "regexp" : { "description" : ".*plan is.*" }
  }

感谢Jai Sharma带领我,我只想提供更多的细节。

Thanks to Jai Sharma for leading me; I just wanted to provide more detail.

这篇关于在弹性搜索中,如何搜索任意的子串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆