仅当在开始和结束模式之间时才匹配模式 [英] Match a pattern only if between an opening and closing pattern
问题描述
我的正则表达式:
(?si)\bStart\b(.*?)\bError\b(.*?)\bEnd\b
这适用于以下场景:
stuff happens
Start
stuff happens
Error
stuff happens
End
但也匹配Start
和End
序列之外的Error
:
But also matches Error
outside Start
and End
sequences:
Start
End
Error
Start
End
当条件变得像场景 #2 时,如何只匹配第一个示例中的匹配?
How to only match hits like in the first example, when conditions become like scenario #2?
推荐答案
Alexander 的回答 可能已经足够了,但我会这样做:
Alexander's answer is probably good enough, but I would do it like this:
(?si)\bStart\b(?:(?!\b(?:Start|End)\b).)*\bError\b(?:(?!\b(?:Start|End)\b).)*\bEnd\b
这个正则表达式的主要优点是它失败得更快.((?!\bStart\b).)*?
如果有一个 End
可以正常工作,但如果没有匹配项,它仍然必须在它放弃匹配之前,一直走到下一个 Start
(如果有)或到文档的末尾.
The main advantage of this regex is that it fails more quickly. ((?!\bStart\b).)*?
works fine if there is an End
where you expect one, but if no match is possible, it still has to go all the way to the next Start
(if there is one) or to the end of the document before it can give up on the match.
事实上,您可以更进一步,完全消除回溯:
In fact, you can take it a step further and eliminate backtracking entirely:
(?si)\bStart\b(?>(?:(?!\b(?:Start|End|Error)\b).)*)\bError\b(?>(?:(?!\b(?:Start|End|Error)\b).)*)\bEnd\b
添加一个 Error
替代并将该部分包含在一个原子组中意味着如果它找到一个 Start
而没有找到一个 Error
在下一个End
之前,立即失败.
Adding an Error
alternative and enclosing that part in an atomic group means if it finds a Start
and doesn't find a Error
before the next End
, it fails immediately.
这是一个 PowerShell 示例(由 RegexBuddy 生成):
Here's a PowerShell example (as generated by RegexBuddy):
$regex = [regex] '(?si)\bStart\b(?>(?:(?!\b(?:Start|End|Error)\b).)*)\bError\b(?>(?:(?!\b(?:Start|End|Error)\b).)*)\bEnd\b'
$matchdetails = $regex.Match($subject)
while ($matchdetails.Success) {
# matched text: $matchdetails.Value
# match start: $matchdetails.Index
# match length: $matchdetails.Length
$matchdetails = $matchdetails.NextMatch()
}
更新: 我刚刚意识到我不应该将 Error
分支添加到第二个交替中.我的正则表达式只匹配那些包含 Error
的 Start..End
块,这可能太具体了.此版本匹配至少一个Error
出现的块:
UPDATE: I just realized that I shouldn't have added the Error
branch to the second alternation. My regex matches only those Start..End
blocks that contain Error
exactly once, which is probably too specific. This version matches a block with at least one occurrence of Error
in it:
(?si)\bStart\b(?>(?:(?!\b(?:Start|End|Error)\b).)*)\bError\b(?>(?:(?!\b(?:Start|End)\b).)*)\bEnd\b
这篇关于仅当在开始和结束模式之间时才匹配模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!