具有可变数量的组的正则表达式? [英] Regular expression with variable number of groups?
问题描述
是否可以创建一个包含可变数量的组的正则表达式?
Is it possible to create a regular expression with a variable number of groups?
例如,运行后...
Pattern p = Pattern.compile("ab([cd])*ef");
Matcher m = p.matcher("abcddcef");
m.matches();
...我想要像
-
m.group(1)
=c
-
m.group(2)
=d
-
m.group(3)
=d
-
m.group(4)
=c
。
m.group(1)
="c"
m.group(2)
="d"
m.group(3)
="d"
m.group(4)
="c"
.
(背景:我正在解析一些数据行,其中一个字段正在重复。我想避免这些字段的 matcher.find
循环。)
(Background: I'm parsing some lines of data, and one of the "fields" is repeating. I would like to avoid a matcher.find
loop for these fields.)
As @Tim Pietzcker在评论中指出, perl6 和 .NET 具有此功能。
As pointed out by @Tim Pietzcker in the comments, perl6 and .NET have this feature.
推荐答案
根据文档,Java正则表达式不能这样做:
According to the documentation, Java regular expressions can't do this:
与
组关联的捕获输入始终是
组最近匹配的子序列。如果
组由于量化而第二次评估
,那么如果第二次
评估失败,则其b $ b先前捕获的值(如果有)
将被保留。将字符串
aba与表达式(a(b)?)+,
匹配,例如,将第二组设置为
b。所有捕获的输入在每场比赛开始时被丢弃
。
The captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string "aba" against the expression (a(b)?)+, for example, leaves group two set to "b". All captured input is discarded at the beginning of each match.
(强调添加)
这篇关于具有可变数量的组的正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!