在 Elasticsearch 中,如何搜索任意子字符串? [英] In Elasticsearch, how do I search for an arbitrary substring?

查看:17
本文介绍了在 Elasticsearch 中,如何搜索任意子字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Elasticsearch 中,如何搜索任意子字符串,可能包括空格?(搜索单词的一部分是不够的;我想搜索整个字段的任何子字符串.)

In Elasticsearch, how do I search for an arbitrary substring, perhaps including spaces? (Searching for part of a word isn't quite enough; I want to search any substring of an entire field.)

我想它必须在 keyword 字段中,而不是 text 字段中.

I imagine it has to be in a keyword field, rather than a text field.

假设我的 Elasticsearch 索引中只有几千个文档,我尝试:

Suppose I have only a few thousand documents in my Elasticsearch index, and I try:

  "query": {
         "wildcard" : { "description" : "*plan*" }
  }

这按预期工作——我得到了描述中包含计划"的所有项目,甚至像替代"这样的项目.

That works as expected--I get every item where "plan" is in the description, even ones like "supplantation".

现在,我想做

  "query": {
         "wildcard" : { "description" : "*plan is*" }
  }   

...这样我就可以在许多其他可能性中将文档与Kaplan is not"匹配起来.

...so that I might match documents with "Kaplan isn't" among many other possibilities.

对于通配符、匹配前缀或我可能会看到的任何其他查询类型,这似乎是不可能的.如何简单地搜索任何子字符串?(在 SQL 中,我只会做 description LIKE '%plan is%')

It seems this isn't possible with wildcard, match prefix, or any other query type I might see. How do I simply search on any substring? (In SQL, I would just do description LIKE '%plan is%')

(我知道对于大型数据集,任何此类查询都会很慢甚至不可能.)

(I am aware any such query would be slow or perhaps even impossible for large data sets.)

推荐答案

我希望这个 Elasticsearch 可能有内置的东西,因为这个简单的子字符串搜索似乎是一个非常基本的功能(仔细想想,它是在 C 中实现为 strstr(),在 SQL 中实现为 LIKE '%%',在大多数文本编辑器中实现为 Ctrl+F,在 C# 中实现为 String.IndexOf等),但情况似乎并非如此.请注意,正则表达式查询不支持不区分大小写,因此我还需要将其与此自定义分析器配对,以便索引匹配全小写.然后我也可以将我的搜索字符串转换为小写.

I was hoping there might be something built-in for this Elasticsearch, given that this simple substring search seems like a very basic capability (Thinking about it, it is implemented as strstr() in C, LIKE '%%' in SQL, Ctrl+F in most text editors, String.IndexOf in C#, etc.), but this seems not to be the case. Note that the regexp query doesn't support case insensitivity, so I also needed to pair it with this custom analyzer, so that the index matches all-lowercase. Then I can convert my search string to lowercase as well.

{
  "settings": {
    "analysis": {
      "analyzer": {
        "lowercase_keyword": { 
          "type": "custom",
          "tokenizer": "keyword", 
          "filter": [ "lowercase" ] 
        }
      }
    }
  },
  "mappings": { 
     ...
     "description": {"type": "text", "analyzer": "lowercase_keyword"},
  }
}

示例查询:

  "query": {
         "regexp" : { "description" : ".*plan is.*" }
  }

感谢 Jai Sharma 带领我;我只是想提供更多细节.

Thanks to Jai Sharma for leading me; I just wanted to provide more detail.

这篇关于在 Elasticsearch 中,如何搜索任意子字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆