捕获使用.NET正则表达式匹配平衡内项目 [英] Capturing inner items using .net Regex Balanced Matching

查看:131
本文介绍了捕获使用.NET正则表达式匹配平衡内项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现在平衡匹配以下资源.NET的正则表达式:

I have found the following resources on Balanced Matching for .net Regexes:

  • http://weblogs.asp.net/whaggard/archive/2005/02/20/377025.aspx
  • http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx
  • http://msdn.microsoft.com/en-us/library/bs2twtah%28VS.85%29.aspx#BalancingGroupDefinitionExample

这是我看过这些,下面的例子应该工作:

From what I have read in these, the following example should work:

这个正则表达式应该找到一个a任何地方的尖括号组内,不管有多深。它应该匹配< A> << A>> < A<>> <<>在> <<>< A>>

This regex should find an "a" anywhere within an angle-bracket group, no matter how deep. It should match "<a>", "<<a>>", "<a<>>", "<<>a>", "<<><a>>", etc.

(?<=
    ^
    (
    	(
    		<(?<Depth>)
    		|
    		>(?<-Depth>)
    	)
    	[^<>]*?
    )+?
)
(?(Depth)a|(?!))

匹配的一个字符串&LT;&LT;> A>

matching on the "a" in the string "<<>a>"

虽然会为字符串&LT; A&LT;&GT;&GT; &LT;&LT; A&GT;&GT; ,我不能让它匹配一个a是继>

While it will work for strings "<a<>>" and "<<a>>", I can't get it to match an "a" that is following a ">".

根据我看过的解释,前两个&LT;S应该增加深度的两倍,那么第一个>应该递减一次。在这一点上,((深度)一个?|(?!))应执行是的选项,但正则表达式甚至从来没有让在这里

According to the explanations I have read, the first two "<"s should increment Depth twice, then the first ">" should decrement it once. At this point, (?(Depth)a|(?!)) should perform the "yes" option, but the regex never even makes it here.

考虑下面的正则表达式,这使得没有这样的检查,仍然不匹配字符串中的问题:

Consider the following regex, which makes no such check and still fails to match the string in question:

(?<=
    ^
    (
    	(
    		<(?<Depth>)
    		|
    		>(?<-Depth>)
    	)
    	[^<>]*?
    )+?
)
a

我缺少的东西,或者是正则表达式引擎的工作不正确?

Am I missing something, or is the regex engine working incorrectly?

推荐答案

如果你想找到每个'A'这是一个平衡的一对尖括号里面的,我会建议这种方法:

If you want to find every 'a' that's inside a balanced pair of angle brackets, I would suggest this approach:

Regex r = new Regex(@"
    <
      (?>
         [^<>a]+
       |
         (a)
       |
         <(?<N>)
       |
         >(?<-N>)
      )+
    (?(N)(?!))
    >
", RegexOptions.IgnorePatternWhitespace);
string target = @"012a<56a8<0a2<4a6a>>012a<56789a>23456a";
foreach (Match m in r.Matches(target))
{
  Console.WriteLine("{0}, {1}", m.Index, m.Value);
  foreach (Capture c in m.Groups[1].Captures)
  {
    Console.WriteLine("{0}, {1}", c.Index, c.Value);
  }
}

结果:

9, <0a2<4a6a>>
11, a
15, a
17, a
24, <56789a>
30, a

而不是摆弄有条件的,它会开始,整个支架分隔(分)字符串捕获任何 A 的它可能包含匹配,在这个过程中。不像你的方法,它可以采摘任意数量的括号内的子串出一个更大的字符串,以及任何数量的 A 的出每个子字符串。

Instead of mucking about with the conditional, it goes ahead and matches the whole bracket-delimited (sub)string, in the process capturing any a's it might contain. Unlike your approach, it can pluck any number of bracketed substrings out of a larger string, and any number of a's out of each substring.

这篇关于捕获使用.NET正则表达式匹配平衡内项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆