弹性搜索上的负前瞻正则表达式 [英] negative lookahead regex on elasticsearch

查看:28
本文介绍了弹性搜索上的负前瞻正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对 elasticsearch 查询进行负面预测,正则表达式是:

I'm trying to do a negative lookahead on an elasticsearch query, the regex is:

(?!.*charge)(?!.*encode)(?!.*relate).*night.*

我匹配的文本是:

在晚上住宿时记回了,但仍然存在施工问题.由于化学物质被喷洒而导致健康问题并导致眼睛被激怒了.

credited back on night stay, still having issues with construction. causing health issues due to a chemical being sprayed and causes eyes to irritated.

我没有走运.有人可以帮忙吗?

I didn't get any lucky. Can someone give a hand?

ES 查询:

  "query": {
    "filtered": {
      "query": {
        "bool": {
          "must_not": [
            {
              "regexp": {
                "message": {
                  "value": "(?!.*charge)(?!.*encode)(?!.*relate).*night.*",
                  "flags_value": 65535
                }
              }
            }
          ]
        }
      },
      "filter": {
        "match": {
          "resNb": {
            "query": "462031152161",
            "type": "boolean"
          }
        }
      }
    }
  }

推荐答案

解决方案

您可以通过两者之一解决问题:

Solution

You can solve the issue with either of the two:

"value": "~(charge|encode|relate)night~(charge|encode|relate)",

.*night.*&~(.*(charge|encode|relate).*)

有一个可选的(因为默认情况下)

"flags" : "ALL"

它是如何工作的?

在常见的 NFA 正则表达式中,您通常有否定的环顾,有助于限制更通用的模式(那些看起来像 (?!...)(?<!...)).但是,在 ElasticSearch 中,您需要使用特定的 可选运算符.

How does it work?

In common NFA regular expressions, you usually have negative lookarounds that help restrict a more generic pattern (those that look like (?!...) or (?<!...)). However, in ElasticSearch, you need to use specific optional operators.

~(波浪号)是补码,*用于在它之后否定原子.一个原子要么是一个单一的符号,要么是一个组内的一组子模式/替代品.

The ~ (tilde) is the complement that is *used to negate an atom right after it. An atom is either a single symbol or a group of subpatterns/alternatives inside a group.

注意,默认情况下所有 ES 模式都锚定在字符串的开头和结尾,您永远不需要使用类似 Perl 和 .NET 的 ^$, 和其他 NFA.

NOTE that all ES patterns are anchored at the start and end of string by default, you never need to use ^ and $ common in Perl-like and .NET, and other NFAs.

因此,

  • ~(charge|encode|relate) - 匹配字符串开头的任何文本,除了 chargeencode>相关
  • night - 匹配单词 night
  • ~(charge|encode|relate) - 匹配除 3 个子字符串中的任何一个之外的任何文本,直到字符串末尾.
  • ~(charge|encode|relate) - matches any text from the start of the string other than charge, encode and relate
  • night - matches the word night
  • ~(charge|encode|relate) - matches any text other than either of the 3 substrings up to the end of string.

在像 Perl 这样的 NFA 正则表达式中,您可以使用 调节贪婪令牌:

In an NFA regex like Perl, you could write that pattern using a tempered greedy token:

/^(?:(?!charge|encode|relate).)*night(?:(?!charge|encode|relate).)*$/

第二种模式比较棘手:常见的 NFA 正则表达式在匹配时通常不会从一个位置跳转到另一个位置,因此,通常使用锚定在文本开头的前瞻.在这里,使用 INTERSECTION 我们可以只使用 2 个模式,其中一个将匹配字符串 第二个也应该匹配字符串.

The second pattern is trickier: common NFA regexes usually do not jump from location to location when matching, thus, lookaheads anchored at the start of text are commonly used. Here, using an INTERSECTION we can just use 2 patterns, where one will be matching the string and the second one should also match the string.

  • .*night.* - 匹配整行(因为 . 匹配除换行符以外的任何符号,否则,使用 (.| )*) 和 night
  • & - and
  • ~(.*(charge|encode|relate).*) - 没有chargeencode和<的行code>relate 子串.
  • .*night.* - match the whole line (as . matches any symbol but a newline, else, use (.| )*) with night in it
  • & - and
  • ~(.*(charge|encode|relate).*) - the line that does not have charge, encode and relate substrings in it.

类似 NFA Perl 的正则表达式看起来像

An NFA Perl-like regex would look like

/^(?!.*(charge|encode|relate)).*night.*$/

这篇关于弹性搜索上的负前瞻正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆