在弹性搜索中,如何搜索任意的子串? [英] In Elasticsearch, how do I search for an arbitrary substring?
问题描述
在弹性搜索中,如何搜索任意的子串,也许包括空格? (搜索部分字词不够,我想搜索整个字段的任何子字符串。)
In Elasticsearch, how do I search for an arbitrary substring, perhaps including spaces? (Searching for part of a word isn't quite enough; I want to search any substring of an entire field.)
我想象它必须在一个关键字
字段,而不是文本
字段。
I imagine it has to be in a keyword
field, rather than a text
field.
假设我在我的Elasticsearch索引中只有几千个文档,我尝试:
Suppose I have only a few thousand documents in my Elasticsearch index, and I try:
"query": {
"wildcard" : { "description" : "*plan*" }
}
预期 - 我得到描述中的计划的每个项目,甚至像替代一样。
That works as expected--I get every item where "plan" is in the description, even ones like "supplantation".
现在,我想做
"query": {
"wildcard" : { "description" : "*plan is*" }
}
...以便我可以匹配文件与Kaplan不是等许多其他可能性。
...so that I might match documents with "Kaplan isn't" among many other possibilities.
看起来这是不可能的通配符,匹配前缀或任何其他查询类型,我可能会看到。如何简单地搜索任何子串? (在SQL中,我只需要执行描述LIKE'%plan is%'
)
It seems this isn't possible with wildcard, match prefix, or any other query type I might see. How do I simply search on any substring? (In SQL, I would just do description LIKE '%plan is%'
)
(我知道任何这样的查询对于大型数据集来说是缓慢的,甚至是不可能的。)
(I am aware any such query would be slow or perhaps even impossible for large data sets.)
推荐答案
我希望有一些内置的东西对于这个Elasticsearch,鉴于这个简单的子字符串搜索似乎是一个非常基本的功能(考虑它,它被实现为C中的 strstr()
, SQL中的LIKE'%%'
,大多数文本编辑器中的Ctrl + F,C#中的 String.IndexOf
等),但这似乎不就这样请注意,正则表达式查询不支持大小写不敏感,因此我还需要将其与此自定义分析器配对,以使索引全为小写。然后我可以将我的搜索字符串转换为小写。
I was hoping there might be something built-in for this Elasticsearch, given that this simple substring search seems like a very basic capability (Thinking about it, it is implemented as strstr()
in C, LIKE '%%'
in SQL, Ctrl+F in most text editors, String.IndexOf
in C#, etc.), but this seems not to be the case. Note that the regexp query doesn't support case insensitivity, so I also needed to pair it with this custom analyzer, so that the index matches all-lowercase. Then I can convert my search string to lowercase as well.
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [ "lowercase" ]
}
}
}
},
"mappings": {
...
"description": {"type": "text", "analyzer": "lowercase_keyword"},
}
}
示例查询:
"query": {
"regexp" : { "description" : ".*plan is.*" }
}
感谢Jai Sharma带领我,我只想提供更多的细节。
Thanks to Jai Sharma for leading me; I just wanted to provide more detail.
这篇关于在弹性搜索中,如何搜索任意的子串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!