“[^][]"是什么意思?正则表达式是什么意思? [英] What does the "[^][]" regex mean?
问题描述
我在以下正则表达式中找到了它:
I found it in the following regex:
[(?:[^][]|(?R))*]
它匹配方括号(及其内容)和嵌套方括号.
It matches square brackets (with their content) together with nested square brackets.
推荐答案
[^][]
是一个字符类,表示除 [
和 之外的所有字符]
.
[^][]
is a character class that means all characters except [
and ]
.
您可以避免转义 [
和 ]
特殊字符,因为它对于 PCRE(preg_
函数中使用的正则表达式引擎)来说没有歧义.
You can avoid escaping [
and ]
special characters since it is not ambiguous for the PCRE, the regex engine used in preg_
functions.
由于 [^]
在 PCRE 中不正确,正则表达式解析的唯一方法是 ]
位于稍后将关闭的字符类中.与后面的 [
相同.它不能重新打开字符类内的字符类(POSIX 字符类 [:alnum:]
除外).那么最后一个]
就清晰了;这是字符类的结束.但是,字符类之外的 [
必须被转义,因为它被解析为字符类的开头.
Since [^]
is incorrect in PCRE, the only way for the regex to parse is that ]
is inside the character class which will be closed later. The same with the [
that follows. It can not reopen a character class (except a POSIX character class [:alnum:]
) inside a character class. Then the last ]
is clear; it is the end of the character class. However, a [
outside a character class must be escaped since it is parsed as the beginning of a character class.
同样的,你可以写[]]
or [[]
or [^[]
不用转义[
或 ]
在字符类中.
In the same way, you can write []]
or [[]
or [^[]
without escaping the [
or ]
in the character class.
注意:自 PHP 7.3 起,您可以使用内联 xx 修饰符,即使在字符类中也允许忽略空白字符.通过这种方式,您可以以不那么模棱两可的方式编写这些类:(?xx) [^ ][ ] [ ] ] [ [ ] [^ [ ]
.
Note: since PHP 7.3, you can use the inline xx modifier that allows blank characters to be ignored even inside character classes. This way you can write these classes in a less ambigous way like that: (?xx) [^ ][ ] [ ] ] [ [ ] [^ [ ]
.
您可以将此语法与多种正则表达式一起使用:PCRE(PHP、R)、Perl、Python、Java、.NET、GO、awk、Tcl(如果您用大括号分隔您的模式,感谢 Donal Fellows), ...
You can use this syntax with several regex flavour: PCRE (PHP, R), Perl, Python, Java, .NET, GO, awk, Tcl (if you delimit your pattern with curly brackets, thanks Donal Fellows), ...
但不适用于:Ruby、JavaScript(IE < 9 除外)、...
But not with: Ruby, JavaScript (except for IE < 9), ...
正如 m.buettner 指出的那样,[^]]
不是二义性的,因为 ]
是 第一个 字符,[^a]]
被视为所有不是 a
后跟 ]
的东西.要拥有 a
和 ]
,你必须写:[^a]]
或 [^]a]
As m.buettner noted, [^]]
is not ambiguous because ]
is the first character, [^a]]
is seen as all that is not a a
followed by a ]
. To have a
and ]
, you must write: [^a]]
or [^]a]
在 JavaScript 的特殊情况下,规范允许 []
作为 永远 匹配的正则表达式标记(换句话说,[]
将总是失败)和 [^]
作为匹配任何字符的正则表达式.那么[^]]
被视为任何字符后跟一个]
.实际实现各不相同,但现代浏览器一般都遵循规范中的定义.
In particular case of JavaScript, the specification allow []
as a regex token that never matches (in other words, []
will always fail) and [^]
as a regex that matches any character. Then [^]]
is seen as any character followed by a ]
. The actual implementation varies, but modern browser generally sticks to the definition in the specification.
模式详情:
[ # literal [
(?: # open a non capturing group
[^][] # a character that is not a ] or a [
| # OR
(?R) # the whole pattern (here is the recursion)
)* # repeat zero or more time
] # a literal ]
在您的模式示例中,您不需要转义最后一个 ]
In your pattern example, you don't need to escape the last ]
但是你可以用这个模式做同样的事情,稍微优化一下,更有用的原因可重用为子模式 (with the (?-1)
): ([(?:[^][]+|(?-1))*+])
But you can do the same with this pattern a little bit optimized, and more useful cause reusable as subpattern (with the (?-1)
): ([(?:[^][]+|(?-1))*+])
( # open the capturing group
[ # a literal [
(?: # open a non-capturing group
[^][]+ # all characters but ] or [ one or more time
| # OR
(?-1) # the last opened capturing group (recursion)
# (the capture group where you are)
)*+ # repeat the group zero or more time (possessive)
] # literal ] (no need to escape)
) # close the capturing group
或更好:([[^][]*(?:(?-1)[^][]*)*+])
避免了交替的成本.
or better: ([[^][]*(?:(?-1)[^][]*)*+])
that avoids the cost of an alternation.
这篇关于“[^][]"是什么意思?正则表达式是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!