捕获外部父组同时忽略内部父组 [英] Capture outer paren groups while ignoring inner paren groups
问题描述
我正在使用 C# 和正则表达式,尝试捕获外部括号组,同时忽略内部括号组.我有遗留生成的文本文件,其中包含数千个字符串结构,如下所示:
I'm using C# and regex, trying capture outer paren groups while ignoring inner paren groups. I have legacy-generated text files containing thousands of string constructions like the following:
([txtData] of COMPOSITE
(dirty FALSE)
(composite [txtModel])
(view [star3])
(creationIndex 0)
(creationProps )
(instanceNameSpecified FALSE)
(containsObject nil)
(sName txtData)
(txtDynamic FALSE)
(txtSubComposites )
(txtSubObjects )
(txtSubConnections )
)
([txtUI] of COMPOSITE
(dirty FALSE)
(composite [txtModel])
(view [star2])
(creationIndex 0)
(creationProps )
(instanceNameSpecified FALSE)
(containsObject nil)
(sName ApplicationWindow)
(txtDynamic FALSE)
(txtSubComposites )
(txtSubObjects )
(txtSubConnections )
)
([star38] of COMPOSITE
(dirty FALSE)
(composite [txtUI])
(view [star39])
(creationIndex 26)
(creationProps composite [txtUI] sName Bestellblatt)
(instanceNameSpecified TRUE)
(containsObject COMPOSITE)
(sName Bestellblatt)
(txtDynamic FALSE)
(txtSubComposites )
(txtSubObjects )
(txtSubConnections )
)
我正在寻找一个可以捕获上面示例中的 3 个分组的正则表达式,这是我迄今为止尝试过的:
I am looking for a regex that will capture the 3 groupings in the example above, and here is what I have tried so far:
Regex regex = new Regex(@"\((.*?)\)");
return regex.Matches(str);
上述正则表达式的问题在于它会找到内部括号分组,例如dirty FALSE
和composite [txtModel]
.但是我希望它匹配的是每个外部分组,例如上面显示的 3.外部分组的定义很简单:
The problem with the regex above is that it finds inner paren groupings such as dirty FALSE
and composite [txtModel]
. But what I want it to match is each of the outer groupings, such as the 3 shown above. The definition of an outer grouping is simple:
- 打开括号是文件中的第一个字符,或者它跟在换行符和/或回车符之后.
- 结束括号要么是文件中的最后一个字符,要么后面跟着一个换行符或回车符.
我希望正则表达式模式忽略所有不遵守上面数字 1 和 2 的父分组.通过忽略"我的意思是它们不应该被视为匹配 - 但它们应该作为外部分组匹配的一部分返回.
I want the regex pattern to ignore all paren-groupings that don't obey numbers 1 and 2 above. By "ignore" I mean that they shouldn't be seen as a match - but they should be returned as part of the outer grouping match.
因此,为了实现我的目标,当我的 C# 正则表达式与上面的示例运行时,我应该返回一个 regex MatchCollection
恰好有 3 个匹配项,如上所示.
So, for my objective to be met, when my C# regex runs against the example above, I should get back a regex MatchCollection
with exactly 3 matches, just as shown above.
是怎么做的?(提前致谢.)
How is it done? (Thanks in advance.)
推荐答案
您可以通过 实现平衡组.
这是一个匹配外括号的演示.
Here is a demo to match outer brackets.
string sentence = @"([txtData] of COM ..."; // your text
string pattern = @"\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\)";
Regex rgx = new Regex(pattern);
foreach (Match match in rgx.Matches(sentence))
{
Console.WriteLine(match.Value);
Console.WriteLine("--------");
}
这篇关于捕获外部父组同时忽略内部父组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!