正则表达式 - 捕获重复组 [英] Regex - Capturing a Repeated Group
问题描述
好吧,我读过的教程和炒我的头太多要能看清楚了。
Alright, I've read the tutorials and scrambled my head too much to be able to see clearly now.
我试图捕捉参数和函数签名的类型信息。因此,考虑签名是这样的:
I'm trying to capture parameters and their type info from a function signature. So given a signature like this:
function(/*string*/a,b,c)
我想要得到的部分是这样的:
I want to get the parts like this:
type: string
param:a
param:b
param:c
这是好太多:
type: string
param:a
type: null (or whitespace)
param:b
type: null (or whitespace)
param:c
于是我想出了这个正则表达式这是做重复采集的常见错误(我已经明确捕捉开启):
So I came up with this regex which is doing the common mistake of repeating the capture (I've explicit capture turned on):
function\(((\/\*(?<type>[a-zA-Z]+)\*\/)?(?<param>[0-9a-zA-Z_$]+),?)*\)
但问题是,我不能纠正错误。 :(请帮帮忙!
Problem is, I can't correct the mistake. :(. Please help!
推荐答案
通常情况下,你需要两个步骤获得的所有数据。
首先,匹配/验证整个功能:
Generally, you'd need two steps to get all data.
First, match/validate the whole function:
function\((?<parameters>((\/\*[a-zA-Z]+\*\/)?[0-9a-zA-Z_$]+,?)*)\)
请注意,现在你有一个参数
组所有参数。你可以搭配一些样式再次得到所有的参数匹配,或在这种情况下,拆分对,
。
Note that now you have a parameters
group with all parameters. You can match some of the pattern again to get all matches of parameters, or in this case, split on ,
.
如果你使用的.Net,不管怎样,你很幸运。净保留每个组的所有捕获的全部记录,这样你就可以使用集合:
If you're using .Net, by any chance, you're in luck. .Net keeps full record of all captures of each group, so you can use the collection:
match.Groups["param"].Captures
一些注意事项:
Some notes:
- 如果你想捕捉多个类型,您一定要空场比赛,这样你就可以很容易地结合比赛(虽然你可以进行排序,但1:1的捕捉整洁)。在这种情况下,你想要的可选组的在的捕获的组:<?code>(小于型&GT;(\ / \ * [A-ZA-Z] + \ * \ / )?)
- 您不必逃避斜线净模式 -
/
有没有什么特别的意义有(C#/。网络不具有正则表达式的分隔符)
- If you do want to capture more than one type, you definitely want empty matches, so you can easily combine the matches (though you can sort, but a 1-to-1 capture is neater). In that case, you want the optional group inside your captured group:
(?<type>(\/\*[a-zA-Z]+\*\/)?)
- You don't have to escape slashes in .Net patterns -
/
has no special meaning there (C#/.Net doesn't have regex delimiters).
下面是一个使用捕获的一个例子。此外,主要的一点是保持在关系类型
和参数
:要捕捉空的类型,所以你不'T失去计数。
图案:
Here's an example of using the captures. Again, the main point is maintaining the relation between type
and param
: you want to capture empty types, so you don't lose count.
Pattern:
function
\(
(?:
(?:
/\*(?<type>[a-zA-Z]+)\*/ # type within /* */
| # or
(?<type>) # capture an empty type.
)
(?<param>
[0-9a-zA-Z_$]+
)
(?:,|(?=\s*\))) # mandatory comma, unless before the last ')'
)*
\)
code:
Code:
Match match = Regex.Match(s, pattern, RegexOptions.IgnorePatternWhitespace);
CaptureCollection types = match.Groups["type"].Captures;
CaptureCollection parameters = match.Groups["param"].Captures;
for (int i = 0; i < parameters.Count; i++)
{
string parameter = parameters[i].Value;
string type = types[i].Value;
if (String.IsNullOrEmpty(type))
type = "NO TYPE";
Console.WriteLine("Parameter: {0}, Type: {1}", parameter, type);
}
这篇关于正则表达式 - 捕获重复组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!