正则表达式中的 + 是什么意思? [英] What is the meaning of + in a regex?
问题描述
正则表达式中的加号是什么意思?
What does the plus symbol in regex mean?
推荐答案
+
实际上可以有两种含义,具体取决于上下文.
+
can actually have two meanings, depending on context.
就像提到的其他答案一样,+
通常是 repetition 运算符,并使前面的标记重复一次或多次.a+
将在 正式中表示为 aa*
语言理论,也可以表示为a{1,}
(最少匹配1次,最多匹配无限次).
Like the other answers mentioned, +
usually is a repetition operator, and causes the preceding token to repeat one or more times. a+
would be expressed as aa*
in formal language theory, and could also be expressed as a{1,}
(match a minimum of 1 times and a maximum of infinite times).
但是,+
也可以使其他量词 possessive如果它跟在重复操作符之后(即 ?+
、*+
、++
或 {m,n}+
).所有格量词是某些正则表达式(PCRE、Java 和 JGsoft 引擎)的高级功能,它告诉引擎在匹配完成后不要回溯.
However, +
can also make other quantifiers possessive if it follows a repetition operator (ie ?+
, *+
, ++
or {m,n}+
). A possessive quantifier is an advanced feature of some regex flavours (PCRE, Java and the JGsoft engine) which tells the engine not to backtrack once a match has been made.
要了解其工作原理,我们需要了解正则表达式引擎的两个概念:贪婪和回溯.贪婪意味着通常正则表达式会尝试使用尽可能多的字符.假设我们的模式是 .*
(dot 是一个正则表达式中的特殊结构,表示任何字符1;星号表示匹配零次或多次),您的目标是aaaaaaaab
.整个字符串将被消耗,因为整个字符串是满足模式的最长匹配.
To understand how this works, we need to understand two concepts of regex engines: greediness and backtracking. Greediness means that in general regexes will try to consume as many characters as they can. Let's say our pattern is .*
(the dot is a special construct in regexes which means any character1; the star means match zero or more times), and your target is aaaaaaaab
. The entire string will be consumed, because the entire string is the longest match that satisfies the pattern.
但是,假设我们将模式更改为 .*b
.现在,当正则表达式引擎尝试匹配 aaaaaaaab
时,.*
将再次消耗整个字符串.但是,由于引擎将到达字符串的末尾并且模式尚未满足(.*
消耗了所有内容,但模式仍然必须匹配 b
之后),它将回溯,一次一个字符,并尝试匹配b
.第一次回溯将使.*
消费aaaaaaaa
,然后b
可以消费b
,模式成功.
However, let's say we change the pattern to .*b
. Now, when the regex engine tries to match against aaaaaaaab
, the .*
will again consume the entire string. However, since the engine will have reached the end of the string and the pattern is not yet satisfied (the .*
consumed everything but the pattern still has to match b
afterwards), it will backtrack, one character at a time, and try to match b
. The first backtrack will make the .*
consume aaaaaaaa
, and then b
can consume b
, and the pattern succeeds.
占有量词也是贪婪的,但如前所述,一旦它们返回匹配项,引擎就不能再回溯到那个点.因此,如果我们将模式更改为 .*+b
(匹配任何字符零次或多次,所有格,后跟一个 b
),并尝试匹配 aaaaaaaab
,同样 .*
将消耗整个字符串,但由于它是所有格,回溯信息被丢弃,并且 b 无法匹配,因此模式失败.
Possessive quantifiers are also greedy, but as mentioned, once they return a match, the engine can no longer backtrack past that point. So if we change our pattern to .*+b
(match any character zero or more times, possessively, followed by a b
), and try to match aaaaaaaab
, again the .*
will consume the whole string, but then since it is possessive, backtracking information is discarded, and the b cannot be matched so the pattern fails.
1 在大多数引擎中,点不会匹配换行符,除非 /s
(singleline"或dotall")modifier 已指定.
1 In most engines, the dot will not match a newline character, unless the /s
("singleline" or "dotall") modifier is specified.
这篇关于正则表达式中的 + 是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!