"[^] []"是什么意思正则表达式是什么意思? [英] What does the "[^][]" regex mean?
问题描述
我在以下正则表达式中找到了它:
I found it in the following regex:
\[(?:[^][]|(?R))*\]
它将方括号(及其内容)与嵌套方括号匹配.
It matches square brackets (with their content) together with nested square brackets.
推荐答案
[^][]
是一个字符类,表示除[
和]
之外的所有字符.
[^][]
is a character class that means all characters except [
and ]
.
您可以避免转义[
和]
特殊字符,因为对于preg_
函数中使用的正则表达式引擎PCRE而言,这不是模棱两可的.
You can avoid escaping [
and ]
special characters since it is not ambiguous for the PCRE, the regex engine used in preg_
functions.
由于[^]
在PCRE中不正确,因此正则表达式解析的唯一方法是]
在字符类内部,该字符类稍后将关闭.与后面的[
相同.它不能重新打开字符类中的字符类(POSIX字符类[:alnum:]
除外).然后最后一个]
被清除;它是角色类的结尾.但是,字符类外部的[
必须转义,因为它被解析为字符类的开头.
Since [^]
is incorrect in PCRE, the only way for the regex to parse is that ]
is inside the character class which will be closed later. The same with the [
that follows. It can not reopen a character class (except a POSIX character class [:alnum:]
) inside a character class. Then the last ]
is clear; it is the end of the character class. However, a [
outside a character class must be escaped since it is parsed as the beginning of a character class.
以同样的方式,您可以编写[]]
或[[]
或[^[]
而不在字符类中转义[
或]
.
In the same way, you can write []]
or [[]
or [^[]
without escaping the [
or ]
in the character class.
注意:从PHP 7.3开始,您可以使用内联xx修饰符,该修饰符甚至可以在字符类内部忽略空白字符.这样,您可以使用一种不太模糊的方式编写这些类,例如:(?xx) [^ ][ ] [ ] ] [ [ ] [^ [ ]
.
Note: since PHP 7.3, you can use the inline xx modifier that allows blank characters to be ignored even inside character classes. This way you can write these classes in a less ambigous way like that: (?xx) [^ ][ ] [ ] ] [ [ ] [^ [ ]
.
您可以将此语法与多种正则表达式一起使用:PCRE(PHP,R),Perl,Python,Java,.NET,GO,awk,Tcl(如果使用花括号来分隔模式,谢谢Donal Fellows ),...
You can use this syntax with several regex flavour: PCRE (PHP, R), Perl, Python, Java, .NET, GO, awk, Tcl (if you delimit your pattern with curly brackets, thanks Donal Fellows), ...
但不支持:Ruby,JavaScript(除IE< 9 以外的),...
But not with: Ruby, JavaScript (except for IE < 9), ...
正如m.buettner所指出的,[^]]
并不是模棱两可的,因为]
是首个字符,[^a]]
被视为不是a
的所有]
.要具有a
和]
,您必须输入:[^a\]]
或[^]a]
As m.buettner noted, [^]]
is not ambiguous because ]
is the first character, [^a]]
is seen as all that is not a a
followed by a ]
. To have a
and ]
, you must write: [^a\]]
or [^]a]
在JavaScript的特殊情况下,规范允许[]
作为从不匹配的正则表达式令牌(换句话说,[]
总是失败),而[^]
作为匹配的正则表达式任何字符.然后[^]]
被视为任何字符,后跟]
.实际的实现方式有所不同,但现代浏览器通常会遵循规范中的定义.
In particular case of JavaScript, the specification allow []
as a regex token that never matches (in other words, []
will always fail) and [^]
as a regex that matches any character. Then [^]]
is seen as any character followed by a ]
. The actual implementation varies, but modern browser generally sticks to the definition in the specification.
模式详细信息:
\[ # literal [
(?: # open a non capturing group
[^][] # a character that is not a ] or a [
| # OR
(?R) # the whole pattern (here is the recursion)
)* # repeat zero or more time
\] # a literal ]
在您的模式示例中,您无需转义最后一个]
In your pattern example, you don't need to escape the last ]
但是您可以对此模式进行一些优化,并在子模式(使用(?-1)
)中重用更有用的原因:(\[(?:[^][]+|(?-1))*+])
But you can do the same with this pattern a little bit optimized, and more useful cause reusable as subpattern (with the (?-1)
): (\[(?:[^][]+|(?-1))*+])
( # open the capturing group
\[ # a literal [
(?: # open a non-capturing group
[^][]+ # all characters but ] or [ one or more time
| # OR
(?-1) # the last opened capturing group (recursion)
# (the capture group where you are)
)*+ # repeat the group zero or more time (possessive)
] # literal ] (no need to escape)
) # close the capturing group
或更好:(\[[^][]*(?:(?-1)[^][]*)*+])
避免了更换的费用.
or better: (\[[^][]*(?:(?-1)[^][]*)*+])
that avoids the cost of an alternation.
这篇关于"[^] []"是什么意思正则表达式是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!