正则表达式中的 + 是什么意思? [英] What is the meaning of + in a regex?

查看:125
本文介绍了正则表达式中的 + 是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正则表达式中的加号是什么意思?

What does the plus symbol in regex mean?

推荐答案

+ 实际上可以有两种含义,具体取决于上下文.

+ can actually have two meanings, depending on context.

就像提到的其他答案一样,+ 通常是 repetition 运算符,并使前面的标记重复一次或多次.a+ 将在 正式中表示为 aa*语言理论,也可以表示为a{1,}(最少匹配1次,最多匹配无限次).

Like the other answers mentioned, + usually is a repetition operator, and causes the preceding token to repeat one or more times. a+ would be expressed as aa* in formal language theory, and could also be expressed as a{1,} (match a minimum of 1 times and a maximum of infinite times).

但是,+ 也可以使其他量词 possessive如果它跟在重复操作符之后(即 ?+*+++{m,n}+).所有格量词是某些正则表达式(PCRE、Java 和 JGsoft 引擎)的高级功能,它告诉引擎在匹配完成后不要回溯.

However, + can also make other quantifiers possessive if it follows a repetition operator (ie ?+, *+, ++ or {m,n}+). A possessive quantifier is an advanced feature of some regex flavours (PCRE, Java and the JGsoft engine) which tells the engine not to backtrack once a match has been made.

要了解其工作原理,我们需要了解正则表达式引擎的两个概念:贪婪回溯.贪婪意味着通常正则表达式会尝试使用尽可能多的字符.假设我们的模式是 .*(dot 是一个正则表达式中的特殊结构,表示任何字符1;星号表示匹配零次或多次),您的目标是aaaaaaaab.整个字符串将被消耗,因为整个字符串是满足模式的最长匹配.

To understand how this works, we need to understand two concepts of regex engines: greediness and backtracking. Greediness means that in general regexes will try to consume as many characters as they can. Let's say our pattern is .* (the dot is a special construct in regexes which means any character1; the star means match zero or more times), and your target is aaaaaaaab. The entire string will be consumed, because the entire string is the longest match that satisfies the pattern.

但是,假设我们将模式更改为 .*b.现在,当正则表达式引擎尝试匹配 aaaaaaaab 时,.* 将再次消耗整个字符串.但是,由于引擎将到达字符串的末尾并且模式尚未满足(.* 消耗了所有内容,但模式仍然必须匹配 b 之后),它将回溯,一次一个字符,并尝试匹配b.第一次回溯将使.*消费aaaaaaaa,然后b可以消费b,模式成功.

However, let's say we change the pattern to .*b. Now, when the regex engine tries to match against aaaaaaaab, the .* will again consume the entire string. However, since the engine will have reached the end of the string and the pattern is not yet satisfied (the .* consumed everything but the pattern still has to match b afterwards), it will backtrack, one character at a time, and try to match b. The first backtrack will make the .* consume aaaaaaaa, and then b can consume b, and the pattern succeeds.

占有量词也是贪婪的,但如前所述,一旦它们返回匹配项,引擎就不能再回溯到那个点.因此,如果我们将模式更改为 .*+b(匹配任何字符零次或多次,所有格,后跟一个 b),并尝试匹配 aaaaaaaab,同样 .* 将消耗整个字符串,但由于它是所有格,回溯信息被丢弃,并且 b 无法匹配,因此模式失败.

Possessive quantifiers are also greedy, but as mentioned, once they return a match, the engine can no longer backtrack past that point. So if we change our pattern to .*+b (match any character zero or more times, possessively, followed by a b), and try to match aaaaaaaab, again the .* will consume the whole string, but then since it is possessive, backtracking information is discarded, and the b cannot be matched so the pattern fails.

1 在大多数引擎中,点不会匹配换行符,除非 /s(singleline"或dotall")modifier 已指定.

1 In most engines, the dot will not match a newline character, unless the /s ("singleline" or "dotall") modifier is specified.

这篇关于正则表达式中的 + 是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆