如何在 Elasticsearch 中进行部分匹配? [英] How do I do a partial match in Elasticsearch?

查看:37
本文介绍了如何在 Elasticsearch 中进行部分匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似于 http://drive.google.com 的链接,我想匹配google"出链接.

I have a link like http://drive.google.com and I want to match "google" out of the link.

我有:

query: {
    bool : {
        must: {
            match: { text: 'google'} 
        }
    }
}

但这仅在整个文本为google"时才匹配(不区分大小写,因此它也匹配 Google 或 GooGlE 等).如何匹配另一个字符串中的google"?

But this only matches if the whole text is 'google' (case insensitive, so it also matches Google or GooGlE etc). How do I match for the 'google' inside of another string?

推荐答案

重点是您使用的 ElasticSearch 正则表达式 需要完整字符串匹配:

The point is that the ElasticSearch regex you are using requires a full string match:

Lucene 的模式总是固定的.提供的模式必须匹配整个字符串.

因此,要匹配任何字符(除了换行符),您可以使用 .* 模式:

Thus, to match any character (but a newline), you can use .* pattern:

match: { text: '.*google.*'}
                ^^      ^^

另一种变体适用于您的字符串可以有换行符的情况:match: { text: '(.| )*google(.| )*'}.这种糟糕的 (.| )* 在 ElasticSearch 中是必须的,因为这种正则表达式风格不允许任何 [sS] 变通方法,也不允许任何 DOTALL/Singleline 标志."Lucene 正则表达式引擎与 Perl 不兼容,但支持的运算符范围更小."

One more variation is for cases when your string can have newlines: match: { text: '(.| )*google(.| )*'}. This awful (.| )* is a must in ElasticSearch because this regex flavor does not allow any [sS] workarounds, nor any DOTALL/Singleline flags. "The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators."

但是,如果您不打算匹配任何复杂的模式并且不需要词边界检查,那么使用纯粹的通配符搜索可以更好地执行对纯子字符串的正则表达式搜索:

However, if you do not plan to match any complicated patterns and need no word boundary checking, regex search for a mere substring is better performed with a mere wildcard search:

{
    "query": {
        "wildcard": {
            "text": {
                "value": "*google*",
                "boost": 1.0,
                "rewrite": "constant_score"
            }
        }
    }
} 

参见通配符搜索 了解更多详情.

See Wildcard search for more details.

注意:通配符模式也需要匹配整个输入字符串,因此

NOTE: The wildcard pattern also needs to match the whole input string, thus

  • google* 查找所有开头的字符串 google
  • *google* 查找所有包含 google
  • 的字符串
  • *google 查找所有结尾的字符串 google
  • google* finds all strings starting with google
  • *google* finds all strings containing google
  • *google finds all strings ending with google

另外,请记住通配符模式中唯一的一对特殊字符:

Also, bear in mind the only pair of special characters in wildcard patterns:

?, which matches any single character
*, which can match zero or more characters, including an empty one

这篇关于如何在 Elasticsearch 中进行部分匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆