PHP正则表达式非捕获非匹配组 [英] PHP regex non-capture non-match group
问题描述
我正在与正则表达式进行日期匹配,这一切都很顺利,我已经到目前为止:
/(?:[0-3])?[0-9] - (?:[0-1])?[0-9] - (?:20)[0-1] [0-9] /
(希望)匹配单数或双位数字的数字和数字,双数或四位数二十一世纪的岁月。一些尝试和错误让我得到了这个结果。
但是,我有两个关于这些结果的简单问题:
-
(?:)
这是一个简单的解释?显然这是一个非匹配的组。但是...然后... -
尾随的
?
是什么?例如(?)?
这是一个评论和答案。
答案部分...我同意亚历克斯早期的答案。
-
<$ c $与
()
相反,c>(?:)用于避免捕获文本,通常会引用较少的后缀与你想要的或提高速度表现。 -
遵循
(?:)
- 或以下任何东西,除了* +?
或}
- 表示在合法比赛中可能找到或可能找不到上述项目。例如,/ z34?/
将匹配z3以及z34,但不匹配z35或z等。
评论部分...我做了可能被认为是对正在工作的正则表达式的改进:
<$ (0?[1-9] | [1-2] [0-9] | 30 | 31) - (0?[1- 9] | 10 | 11 | 12) - ((?:20)?[0-9] [0-9])(?:\s | $)
pre>
- 首先,它避免了像0-0-2011这样的东西
- 其次,它避免像233443-4-201154564这样的东西
- 第三,它包括1-1-2022这样的东西
- 第四,它包括诸如1-1-11之类的东西
- 第五,它避免了像34-4-11这样的东西。
- 第六,它允许你捕获日,月和年,所以你可以更容易地在代码..代码中引用这些代码,例如,进一步检查(第二次捕获组2,并且是第一个捕获组29,这是闰年,否则第一个捕获组<29),以查看是否一个feb 29日期是否合格。
最后,请注意,您仍然会收到不存在的日期,例如31-6-11。如果你想避免这些,请尝试:
(?:^ | \s)(?:(? 0?[1-9] | [1-2] [0-9] | 30 | 31) - (0?[1078] | 10 | 12))|(?:( 0?[1-9] 1-2] [0-9] | 30) - (0?[469] | 11))|(?:( 0?[1-9] | [1-2] [0-9]) - (0 (2))) - ((?:20)?[0-9] [0-9])(?:\s | $)
此外,我假设日期将在之前和后面跟着一个空格(或乞讨/行尾),但是您可能需要调整(例如,允许标点符号)。
其他引用此资源的评论者可能会发现有用的:
http://rubular.com/
I'm making a date matching regex, and it's all going pretty well, I've got this so far:
"/(?:[0-3])?[0-9]-(?:[0-1])?[0-9]-(?:20)[0-1][0-9]/"
It will (hopefully) match single or double digit days and months, and double or quadruple digit years in the 21st century. A few trials and errors have gotten me this far.
But, I've got two simple questions regarding these results:
(?: )
what is a simple explanation for this? Apparently it's a non-matching group. But then...What is the trailing
?
for? e.g.(? )?
[Edited (again) to improve formatting and fix the intro.]
This is a comment and an answer.
The answer part... I do agree with alex' earlier answer.
(?: )
, in contrast to( )
, is used to avoid capturing text, generally so as to have fewer back references thrown in with those you do want or to improve speed performance.The ? following the
(?: )
-- or when following anything except* + ?
or{}
-- means that the preceding item may or may not be found within a legitimate match. Eg,/z34?/
will match z3 as well as z34 but it won't match z35 or z etc.
The comment part... I made what might considered to be improvements to the regex you were working on:
(?:^|\s)(0?[1-9]|[1-2][0-9]|30|31)-(0?[1-9]|10|11|12)-((?:20)?[0-9][0-9])(?:\s|$)
-- First, it avoids things like 0-0-2011
-- Second, it avoids things like 233443-4-201154564
-- Third, it includes things like 1-1-2022
-- Forth, it includes things like 1-1-11
-- Fifth, it avoids things like 34-4-11
-- Sixth, it allows you to capture the day, month, and year so you can refer to these more easily in code.. code that would, for example, do a further check (is the second captured group 2 and is either the first captured group 29 and this a leap year or else the first captured group is <29) in order to see if a feb 29 date qualified or not.
Finally, note that you'll still get dates that won't exist, eg, 31-6-11. If you want to avoid these, then try:
(?:^|\s)(?:(?:(0?[1-9]|[1-2][0-9]|30|31)-(0?[13578]|10|12))|(?:(0?[1-9]|[1-2][0-9]|30)-(0?[469]|11))|(?:(0?[1-9]|[1-2][0-9])-(0?2)))-((?:20)?[0-9][0-9])(?:\s|$)
Also, I assumed the dates would be preceded and followed by a space (or beg/end of line), but you may want ot adjust that (eg, to allow punctuations).
A commenter elsewhere referenced this resource which you might find useful: http://rubular.com/
这篇关于PHP正则表达式非捕获非匹配组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!