这个模式如何匹配连字符而不转义? [英] How does this pattern match hyphen without escape?

查看:43
本文介绍了这个模式如何匹配连字符而不转义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 regex101 中蹒跚几分钟后,我意识到 ] 不需要转义,如果它立即遵循 [.

After toddling in regex101 for a few minutes, I realized that ] does not need to be escaped, if it immediately follws [.

regex101中,模式[]-az] 被描述为

In regex101, the pattern []-a-z] is described as

<代码>/[]-a-z]/[]-a-z] 匹配下面列表中的单个字符]-a ] 和 a 之间范围内的单个字符(区分大小写)-z 列表中的单个字符 -z 字面意思(区分大小写)

但我一直认为,如果 - 必须逐字匹配而不被转义,它应该 要么在开头,要么在结尾.

But I always thought, if - has to be matched literally without being escaped, it should either go at the beginning, or at end.

那么为什么我的模式没有被识别为错误?为什么 -z 与列表中的单个字符 -z 逐字匹配?

Then why is my pattern not recognized as an error? Why does -z matches a single character in the list -z literally ?

推荐答案

让我们分解一下:

[]-a-z]
 ^^ ^
 || +---- 3
 |+------ 2
 +------- 1

1 是文字 ] 因为它出现在模式的开头,而 [] 是 PCRE 中的无效字符类.

1 is a literal ] since it appears at the start of the pattern, and [] is an invalid character class in PCRE.

2 连字符因此是类中的第二个字符,并在 ]a 之间引入了一个范围.

The 2 hyphen is therefore the second character in the class, and introduces a range, between ] and a.

下一个连字符 3 是按字面处理的,因为前一个标记 a 是前一个范围的结尾.此时不能引入另一个范围.在 PCRE 中,如果 - 位于无法引入范围或已转义的位置,则按字面意思处理.我们通常在范围的开头或结尾放置文字连字符以使其更加明显,但这不是必需的.

The next hyphen, 3, is treated literally, because the previous token, a is the end of the previous range. Another range cannot be introduced at this point. In PCRE, a - is treated literally if it's in a place where a range cannot be introduced or if it's escaped. We usually place literal hyphens at the start or the end of the range to make it obvious, but this is not required.

那么,z 是一个简单的文字.

Then, z is a simple literal.

PCRE 遵循 Perl 语法.这是记录如下:

PCRE follows the Perl syntax. This is documented like so:

关于]:

A ] 通常要么是 POSIX 字符类的结束(请参阅下面的 POSIX 字符类),要么表示括号中的字符类的结束.如果要在字符集中包含 ],通常必须对其进行转义.
但是,如果 ] 是带括号的字符类的第一个(如果第一个字符是插入符号,则为第二个)字符,它不会不表示类的结束(因为你不能有一个空类),并且被认为是可以匹配而不转义的字符集的一部分.

A ] is normally either the end of a POSIX character class (see POSIX Character Classes below), or it signals the end of the bracketed character class. If you want to include a ] in the set of characters, you must generally escape it.
However, if the ] is the first (or the second if the first character is a caret) character of a bracketed character class, it does not denote the end of the class (as you cannot have an empty class) and is considered part of the set of characters that can be matched without escaping.

关于连字符:

如果字符类中的连字符在语法上不能成为范围的一部分,例如因为它是字符类的第一个或最后一个字符,或者如果它紧跟在一个范围,连字符并不特殊,因此被认为是要逐字匹配的字符.如果您希望匹配字符集中的连字符,并且它在类中的位置可以被视为范围的一部分,则必须使用反斜杠转义该连字符.

If a hyphen in a character class cannot syntactically be part of a range, for instance because it is the first or the last character of the character class, or if it immediately follows a range, the hyphen isn't special, and so is considered a character to be matched literally. If you want a hyphen in your set of characters to be matched and its position in the class is such that it could be considered part of a range, you must escape that hyphen with a backslash.

请注意,这是指 Perl 语法.其他风格可能有不同的行为.例如,[] 是 JavaScript 中的有效(空)字符类,不能匹配任何内容.

Note that this refers to Perl syntax. Other flavors may have different behavior. For instance, [] is a valid (empty) character class in JavaScript that cannot match anything.

问题在于,根据选项的不同,PCRE 也可以用 JS 的方式来解释它(有几个 JS 兼容性标志).来自 PCRE2 文档:

The catch is that, depending on the options, PCRE could also interpret this in the JS way (there's a couple of JS compatibility flags). From the PCRE2 docs:

左方括号引入了一个字符类,以右方括号结束.默认情况下,右方括号本身并不特殊.如果需要一个右方括号作为类的成员,它应该是类中的第一个数据字符(在初始抑扬符之后,如果存在)或用反斜杠转义.这意味着,默认情况下,不能定义空类.但是,如果设置了 PCRE2_ALLOW_EMPTY_CLASS 选项,则开头的右方括号会结束(空)类.

An opening square bracket introduces a character class, terminated by a closing square bracket. A closing square bracket on its own is not special by default. If a closing square bracket is required as a member of the class, it should be the first data character in the class (after an initial circumflex, if present) or escaped with a backslash. This means that, by default, an empty class cannot be defined. However, if the PCRE2_ALLOW_EMPTY_CLASS option is set, a closing square bracket at the start does end the (empty) class.

不出所料,有关连字符的 PCRE 行为与 Perl 行为相匹配:

The documented PCRE behavior about the hyphen is, unsurprisingly, matching the Perl behavior:

减号(连字符)字符可用于指定字符类中的字符范围.例如,[d-m] 匹配 d 和 m 之间的任何字母,包括两者.如果类中需要减号,它必须用反斜杠转义或出现在不能解释为指示范围的位置,通常作为类中的第一个或最后一个字符,或在一个范围之后立即.例如,[bdz] 匹配 bd 范围内的字母、连字符或 z.

The minus (hyphen) character can be used to specify a range of characters in a character class. For example, [d-m] matches any letter between d and m, inclusive. If a minus character is required in a class, it must be escaped with a backslash or appear in a position where it cannot be interpreted as indicating a range, typically as the first or last character in the class, or immediately after a range. For example, [b-d-z] matches letters in the range b to d, a hyphen character, or z.

这篇关于这个模式如何匹配连字符而不转义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆