仅当在开始和结束模式之间时才匹配模式 [英] Match a pattern only if between an opening and closing pattern

查看:49
本文介绍了仅当在开始和结束模式之间时才匹配模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的正则表达式:

(?si)\bStart\b(.*?)\bError\b(.*?)\bEnd\b

这适用于以下场景:

stuff happens  
Start  
stuff happens  
Error  
stuff happens  
End

但也匹配StartEnd序列之外的Error:

But also matches Error outside Start and End sequences:

Start  
End  
Error  
Start  
End

当条件变得像场景 #2 时,如何匹配第一个示例中的匹配?

How to only match hits like in the first example, when conditions become like scenario #2?

推荐答案

Alexander 的回答 可能已经足够了,但我会这样做:

Alexander's answer is probably good enough, but I would do it like this:

(?si)\bStart\b(?:(?!\b(?:Start|End)\b).)*\bError\b(?:(?!\b(?:Start|End)\b).)*\bEnd\b

这个正则表达式的主要优点是它失败得更快.((?!\bStart\b).)*? 如果有一个 End 可以正常工作,但如果没有匹配项,它仍然必须在它放弃匹配之前,一直走到下一个 Start(如果有)或到文档的末尾.

The main advantage of this regex is that it fails more quickly. ((?!\bStart\b).)*? works fine if there is an End where you expect one, but if no match is possible, it still has to go all the way to the next Start (if there is one) or to the end of the document before it can give up on the match.

事实上,您可以更进一步,完全消除回溯:

In fact, you can take it a step further and eliminate backtracking entirely:

(?si)\bStart\b(?>(?:(?!\b(?:Start|End|Error)\b).)*)\bError\b(?>(?:(?!\b(?:Start|End|Error)\b).)*)\bEnd\b

添加一个 Error 替代并将该部分包含在一个原子组中意味着如果它找到一个 Start没有找到一个 Error 在下一个End之前,立即失败.

Adding an Error alternative and enclosing that part in an atomic group means if it finds a Start and doesn't find a Error before the next End, it fails immediately.

这是一个 PowerShell 示例(由 RegexBuddy 生成):

Here's a PowerShell example (as generated by RegexBuddy):

$regex = [regex] '(?si)\bStart\b(?>(?:(?!\b(?:Start|End|Error)\b).)*)\bError\b(?>(?:(?!\b(?:Start|End|Error)\b).)*)\bEnd\b'
$matchdetails = $regex.Match($subject)
while ($matchdetails.Success) {
    # matched text: $matchdetails.Value
    # match start: $matchdetails.Index
    # match length: $matchdetails.Length
    $matchdetails = $matchdetails.NextMatch()
}

更新: 我刚刚意识到我不应该将 Error 分支添加到第二个交替中.我的正则表达式只匹配那些包含 ErrorStart..End 块,这可能太具体了.此版本匹配至少一个Error出现的块:

UPDATE: I just realized that I shouldn't have added the Error branch to the second alternation. My regex matches only those Start..End blocks that contain Error exactly once, which is probably too specific. This version matches a block with at least one occurrence of Error in it:

(?si)\bStart\b(?>(?:(?!\b(?:Start|End|Error)\b).)*)\bError\b(?>(?:(?!\b(?:Start|End)\b).)*)\bEnd\b

这篇关于仅当在开始和结束模式之间时才匹配模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆