如何在Elasticsearch中进行部分匹配? [英] How do I do a partial match in Elasticsearch?
问题描述
我有一个类似于 http://drive.google.com 的链接,并且我想匹配"google"不在链接中.
I have a link like http://drive.google.com and I want to match "google" out of the link.
我有:
query: {
bool : {
must: {
match: { text: 'google'}
}
}
}
但这仅在整个文本为"google"时才匹配(不区分大小写,因此也匹配Google或GooGlE等).如何匹配另一个字符串中的"google"?
But this only matches if the whole text is 'google' (case insensitive, so it also matches Google or GooGlE etc). How do I match for the 'google' inside of another string?
推荐答案
The point is that the ElasticSearch regex you are using requires a full string match:
Lucene的模式始终是固定的.提供的模式必须匹配整个字符串.
因此,要匹配任何字符(除了换行符),可以使用.*
模式:
Thus, to match any character (but a newline), you can use .*
pattern:
match: { text: '.*google.*'}
^^ ^^
对于字符串可以包含换行符的情况,还有另一种变体:match: { text: '(.|\n)*google(.|\n)*'}
.在ElasticSearch中,此可怕的(.|\n)*
是必需的,因为此正则表达式风格不允许任何[\s\S]
解决方法,也不允许任何DOTALL/Singleline标志. "Lucene正则表达式引擎不兼容Perl,但支持较小范围的运算符."
One more variation is for cases when your string can have newlines: match: { text: '(.|\n)*google(.|\n)*'}
. This awful (.|\n)*
is a must in ElasticSearch because this regex flavor does not allow any [\s\S]
workarounds, nor any DOTALL/Singleline flags. "The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators."
但是,如果您不打算匹配任何复杂的模式并且不需要单词边界检查,那么仅使用通配符搜索就可以更好地执行正则表达式搜索子字符串:
However, if you do not plan to match any complicated patterns and need no word boundary checking, regex search for a mere substring is better performed with a mere wildcard search:
{
"query": {
"wildcard": {
"text": {
"value": "*google*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
请参见 通配符搜索 以获取更多详细信息.
See Wildcard search for more details.
注意:通配符模式还需要匹配整个输入字符串,因此
NOTE: The wildcard pattern also needs to match the whole input string, thus
-
google*
查找所有以开头google
的字符串
-
*google*
查找所有包含google
的字符串
-
*google
查找所有以结尾google
的字符串
google*
finds all strings starting withgoogle
*google*
finds all strings containinggoogle
*google
finds all strings ending withgoogle
此外,请记住通配符模式中仅有的一对特殊字符:
Also, bear in mind the only pair of special characters in wildcard patterns:
?, which matches any single character
*, which can match zero or more characters, including an empty one
这篇关于如何在Elasticsearch中进行部分匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!