使用均衡组定期EX pression [英] Regular expression that uses balancing groups

查看:215
本文介绍了使用均衡组定期EX pression的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用语法是这样一个基本的文本模板引擎:

I have a basic text template engine that uses a syntax like this:

foo bar
%IF MY_VAR
  some text
  %IF OTHER_VAR
    some other text
  %ENDIF
%ENDIF
bar foo

我有一个问题,我使用解析它因此它没有考虑到嵌套的if / endif块。

I have an issue with the regular expression that I am using to parse it whereby it is not taking into account the nested IF/ENDIF blocks.

我使用的是当前的正则表达式是:%IF(?<名称> [\ W _] +)%ENDIF

The current regex I'm using is: %IF (?<Name>[\w_]+)(?<Contents>.*?)%ENDIF

我一直在阅读上的平衡捕捉组(.NET的正则表达式库的功能)据我了解,这是支持递归正则表达式在.NET中推荐的方法。

I have been reading up on balancing capture groups (a feature of .NET's regex library) as I understand this is the recommended way of supporting "recursive" regex's in .NET.

我一直在玩平衡组织和迄今已想出了以下内容:

I've been playing with balancing groups and have so far came up with the following:

(
 (
  (?'Open'%IF\s(?<Name>[\w_]+))
  (?<Contents>.*?)
 )+
 (
  (?'Close-Open'%ENDIF)(?<Remainder>.*?)
 )+
)*
(?(Open)(?!))

但是,这并不完全表现如何,我期望的那样。这是例如拍摄了大量的空组。帮助?

But this is not behaving entirely how I would expect. It is for instance capturing a lot of empty groups. Help?

推荐答案

要捕获整个IF / ENDIF块以平衡IF语句,你可以使用这个正则表达式:

To capture a whole IF/ENDIF block with balanced IF statements, you can use this regex:

%IF\s+(?<Name>\w+)
(?<Contents>
    (?> #Possessive group, so . will not match IF/ENDIF
        \s|
        (?<IF>%IF)|     #for IF, push
        (?<-IF>%ENDIF)| #for ENDIF, pop
        . # or, anything else, but don't allow
    )+
    (?(IF)(?!)) #fail on extra open IFs
)   #/Contents
%ENDIF

这里的关键是:你的不能的捕获单个匹配超过每个命名组之一。你只会得到一个(LT;名称&gt; \ W +)组,例如,最后拍摄的价值。在我正则表达式,我不停的名称内容您简单的正则表达式组,并限在<$平衡C $ C>内容组 - 正则表达式仍包裹在如果 ENDIF

The point here is this: you cannot capture in a single Match more than one of every named group. You will only get one (?<Name>\w+) group, for example, of the last captured value. In my regex, I kept the Name and Contents groups of your simple regex, and limited the balancing inside the Contents group - the regex is still wrapped in IF and ENDIF.

如果您的数据更加复杂变得有趣。例如:

If becomes interesting when your data is more complex. For example:

%IF MY_VAR             
  some text
  %IF OTHER_VAR
    some other text
  %ENDIF
  %IF OTHER_VAR2
    some other text 2
  %ENDIF
%ENDIF                 
%IF OTHER_VAR3         
    some other text 3
%ENDIF                 

在这里,你会得到两场比赛,一个是 MY_VAR ,以及一个用于 OTHER_VAR3 。如果你想捕捉 MY_VAR 的内容两个如果,你必须重新运行它内容的正则表达式组(你可以解决它通过使用一个超前,如果你必须 - 包装在全正则表达式(= ...),但你需要把它变成一个?逻辑结构不知何故,使用位置和长度)。

Here, you will get two matches, one for MY_VAR, and one for OTHER_VAR3. If you want to capture the two ifs on MY_VAR's content, you have to rerun the regex on its Contents group (you can get around it by using a lookahead if you must - wrap the whole regex in (?=...), but you'll need to put it into a logical structure somehow, using positions and lengths).

现在,我不会解释太多,因为它似乎是你的基础知识,而是一个简短的说明有关内容组 - 我使用的占有欲组,以避免回溯。否则,将有可能为点,以最终满足整个如果和打破这种平衡。在A组慵懒的比赛会表现得同样的(()+ 而不是(大于?)+ )。

Now, I won't explain too much, because it seems you get the basics, but a short note about the contents group - I've uses a possessive group to avoid backtracking. Otherwise, it would be possible for the dot to eventually match whole IFs and break the balance. A lazy match on the group would behave similarly (( )+? instead of (?> )+).

这篇关于使用均衡组定期EX pression的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆