正则表达式字符类减法与负组 [英] Regex character class subtraction with negative groups
问题描述
这个问题与正则表达式(regex)中的字符类减法有关.我参考了 XPATH 2.0 第二版的正则表达式.
This question relates to character class subtraction in regular expression (regex). I refer to the regex flavour of XPATH 2.0 second edition.
当字符类减法中存在负组时,减法运算符 (-) 是否出现在之前?还是在负组运算符 (^) 之后?
When there are negative groups within a character class subtraction, does the subtract operator (-) occur before? or after the negative group operator (^)?
XPATH/XML 模式规范的文本如下.但在我看来,它读起来含糊不清.
The text of the XPATH/ XML schema specification is below. But to my mind, it reads ambiguously.
对于任何·正字符组·或·负字符组·G,和任何·字符类表达式·C,G-C 是有效的·字符类减法·,识别 C(G) 中所有字符的集合不在 C(C) 中.
For any ·positive character group· or ·negative character group· G, and any ·character class expression· C, G-C is a valid ·character class subtraction·, identifying the set of all characters in C(G) that are not also in C(C).
更具体地说,请考虑以下三个正则表达式:
To be more specific, consider the following three regexes:
- [^abc-[ad]]
- [^abc-[^ad]]
- [abc-[^ad]]
与以下的干草堆文本匹配:
being matched against the haystack text of:
- abcdef
可能的匹配文本是什么(第一个和后续的)?
What are the possible match texts (first and subsequent)?
推荐答案
我不认为文本有歧义,如果我们足够宽容将 GC
解读为 [G-[C]]
,以及一个否定组,^G
,如[^G]
.现在,插入符号显然是第一组的一部分,并且不会否定这两个组.
I don't think that text is ambiguous, if we are lenient enough to read G-C
as [G-[C]]
, and a negative group, ^G
, as [^G]
. Now, it looks clear that the caret is part of the first group, and does not negate both groups.
因此,[^abc-[ad]]
会匹配:
{除a
、b
和c
之外的所有字符} \ {a
和 d
} = { 除 a
、b
、c
和 d
}
{All Characters Besides
a
,b
andc
} \ {a
andd
} = { All Characters Besidesa
,b
,c
andd
}
请记住,您可以轻松测试以查看行为 :)
.
作为奖励,.Net 正则表达式也支持此功能,使得 在线测试.
另请参阅:字符类减法
Keep in mind, you can easily test to see the behavior :)
.
As a bonus, .Net regular expressions also support this feature, making it a little easier to test online.
See also: Character Class Subtraction
这篇关于正则表达式字符类减法与负组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!