捕获使用.NET正则表达式匹配平衡内项目 [英] Capturing inner items using .net Regex Balanced Matching
问题描述
我发现在平衡匹配以下资源.NET的正则表达式:
I have found the following resources on Balanced Matching for .net Regexes:
- http://weblogs.asp.net/whaggard/存档/ 2005/02/20 / 377025.aspx
- http://blogs.msdn.com/bclteam/存档/ 2005/03/15 / 396452.aspx
- <一个href="http://msdn.microsoft.com/en-us/library/bs2twtah%28VS.85%29.aspx#BalancingGroupDefinitionExample" rel="nofollow">http://msdn.microsoft.com/en-us/library/bs2twtah%28VS.85%29.aspx#BalancingGroupDefinitionExample
- http://weblogs.asp.net/whaggard/archive/2005/02/20/377025.aspx
- http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx
- http://msdn.microsoft.com/en-us/library/bs2twtah%28VS.85%29.aspx#BalancingGroupDefinitionExample
这是我看过这些,下面的例子应该工作:
From what I have read in these, the following example should work:
这个正则表达式应该找到一个a任何地方的尖括号组内,不管有多深。它应该匹配&LT; A&GT;
,&LT;&LT; A&GT;&GT;
,&LT; A&LT;&GT;&GT;
,&LT;&LT;&gt;在&GT;
,&LT;&LT;&GT;&LT; A&GT;&GT;
等
This regex should find an "a" anywhere within an angle-bracket group, no matter how deep. It should match "<a>
", "<<a>>
", "<a<>>
", "<<>a>
", "<<><a>>
", etc.
(?<=
^
(
(
<(?<Depth>)
|
>(?<-Depth>)
)
[^<>]*?
)+?
)
(?(Depth)a|(?!))
匹配的一个字符串&LT;&LT;> A>
matching on the "a" in the string "<<>a>"
虽然会为字符串&LT; A&LT;&GT;&GT;
和&LT;&LT; A&GT;&GT;
,我不能让它匹配一个a是继>
While it will work for strings "<a<>>
" and "<<a>>
", I can't get it to match an "a" that is following a ">".
根据我看过的解释,前两个&LT;S应该增加深度的两倍,那么第一个>应该递减一次。在这一点上,((深度)一个?|(?!))应执行是的选项,但正则表达式甚至从来没有让在这里
According to the explanations I have read, the first two "<"s should increment Depth twice, then the first ">" should decrement it once. At this point, (?(Depth)a|(?!)) should perform the "yes" option, but the regex never even makes it here.
考虑下面的正则表达式,这使得没有这样的检查,仍然不匹配字符串中的问题:
Consider the following regex, which makes no such check and still fails to match the string in question:
(?<=
^
(
(
<(?<Depth>)
|
>(?<-Depth>)
)
[^<>]*?
)+?
)
a
我缺少的东西,或者是正则表达式引擎的工作不正确?
Am I missing something, or is the regex engine working incorrectly?
推荐答案
如果你想找到每个'A'
这是一个平衡的一对尖括号里面的,我会建议这种方法:
If you want to find every 'a'
that's inside a balanced pair of angle brackets, I would suggest this approach:
Regex r = new Regex(@"
<
(?>
[^<>a]+
|
(a)
|
<(?<N>)
|
>(?<-N>)
)+
(?(N)(?!))
>
", RegexOptions.IgnorePatternWhitespace);
string target = @"012a<56a8<0a2<4a6a>>012a<56789a>23456a";
foreach (Match m in r.Matches(target))
{
Console.WriteLine("{0}, {1}", m.Index, m.Value);
foreach (Capture c in m.Groups[1].Captures)
{
Console.WriteLine("{0}, {1}", c.Index, c.Value);
}
}
结果:
9, <0a2<4a6a>>
11, a
15, a
17, a
24, <56789a>
30, a
而不是摆弄有条件的,它会开始,整个支架分隔(分)字符串捕获任何 A
的它可能包含匹配,在这个过程中。不像你的方法,它可以采摘任意数量的括号内的子串出一个更大的字符串,以及任何数量的 A
的出每个子字符串。
Instead of mucking about with the conditional, it goes ahead and matches the whole bracket-delimited (sub)string, in the process capturing any a
's it might contain. Unlike your approach, it can pluck any number of bracketed substrings out of a larger string, and any number of a
's out of each substring.
这篇关于捕获使用.NET正则表达式匹配平衡内项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!