在Elasticsearch上使用负前瞻正则表达式 [英] negative lookahead regex on elasticsearch

查看:116
本文介绍了在Elasticsearch上使用负前瞻正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对Elasticsearch查询进行否定前瞻, 正则表达式为:

I'm trying to do a negative lookahead on an elasticsearch query, the regex is:

(?!.*charge)(?!.*encode)(?!.*relate).*night.*

我要匹配的文本是:

在夜间住宿时获得了返还,但仍然存在建筑方面的问题. 喷洒化学物质会导致健康问题,并引起眼睛 烦死了.

credited back on night stay, still having issues with construction. causing health issues due to a chemical being sprayed and causes eyes to irritated.

我没有任何幸运.有人可以帮忙吗?

I didn't get any lucky. Can someone give a hand?

ES查询:

  "query": {
    "filtered": {
      "query": {
        "bool": {
          "must_not": [
            {
              "regexp": {
                "message": {
                  "value": "(?!.*charge)(?!.*encode)(?!.*relate).*night.*",
                  "flags_value": 65535
                }
              }
            }
          ]
        }
      },
      "filter": {
        "match": {
          "resNb": {
            "query": "462031152161",
            "type": "boolean"
          }
        }
      }
    }
  }

推荐答案

解决方案

您可以使用以下两种方法之一来解决此问题:

Solution

You can solve the issue with either of the two:

"value": "~(charge|encode|relate)night~(charge|encode|relate)",

.*night.*&~(.*(charge|encode|relate).*)

具有可选选项(因为默认情况下为 启用)

With an optional (since it is ON by default)

"flags" : "ALL"

它如何工作?

在常见的NFA正则表达式中,通常使用负向外观,以帮助限制更通用的模式(看起来像(?!...)(?<!...)的那些).但是,在ElasticSearch中,您需要使用特定的

How does it work?

In common NFA regular expressions, you usually have negative lookarounds that help restrict a more generic pattern (those that look like (?!...) or (?<!...)). However, in ElasticSearch, you need to use specific optional operators.

~(代字号)是补码,用于*抵消原子之后的原子.原子可以是单个符号,也可以是一组内的一组子图案/替代物.

The ~ (tilde) is the complement that is *used to negate an atom right after it. An atom is either a single symbol or a group of subpatterns/alternatives inside a group.

请注意,默认情况下,所有ES模式都锚定在字符串的开头和结尾,您无需在类似Perl的.NET和其他NFA中使用常见的^$.

NOTE that all ES patterns are anchored at the start and end of string by default, you never need to use ^ and $ common in Perl-like and .NET, and other NFAs.

因此

  • ~(charge|encode|relate)-匹配字符串开头的所有文本,除了chargeencoderelate
  • night-匹配单词night
  • ~(charge|encode|relate)-与3个子字符串中的任何一个都不匹配,直到字符串末尾.
  • ~(charge|encode|relate) - matches any text from the start of the string other than charge, encode and relate
  • night - matches the word night
  • ~(charge|encode|relate) - matches any text other than either of the 3 substrings up to the end of string.

在像Perl这样的NFA正则表达式中,您可以使用 脾气暴躁的令牌 :

In an NFA regex like Perl, you could write that pattern using a tempered greedy token:

/^(?:(?!charge|encode|relate).)*night(?:(?!charge|encode|relate).)*$/

第二种模式比较棘手:匹配时,常见的NFA正则表达式通常不会从一个位置跳到另一个位置,因此,通常使用锚定在文本开头的前瞻符号.在这里,使用 INTERSECTION 我们可以只使用2种模式,其中一种将匹配字符串第二种也应该匹配字符串.

The second pattern is trickier: common NFA regexes usually do not jump from location to location when matching, thus, lookaheads anchored at the start of text are commonly used. Here, using an INTERSECTION we can just use 2 patterns, where one will be matching the string and the second one should also match the string.

  • .*night.*-匹配整行(因为.匹配除换行符以外的任何符号,否则,请使用(.|\n)*),并在其中加上night
  • &-
  • ~(.*(charge|encode|relate).*)-其中没有chargeencoderelate子字符串的行.
  • .*night.* - match the whole line (as . matches any symbol but a newline, else, use (.|\n)*) with night in it
  • & - and
  • ~(.*(charge|encode|relate).*) - the line that does not have charge, encode and relate substrings in it.

类似于NFA Perl的正则表达式

An NFA Perl-like regex would look like

/^(?!.*(charge|encode|relate)).*night.*$/

这篇关于在Elasticsearch上使用负前瞻正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆