Java 中的正则表达式命名组 [英] Regex Named Groups in Java
问题描述
据我了解,java.regex
包不支持命名组(http://www.regular-expressions.info/named.html) 所以有人能指出我有第三方库吗?
It is my understanding that the java.regex
package does not have support for named groups (http://www.regular-expressions.info/named.html) so can anyone point me towards a third-party library that does?
我看过 jregex 但它的最后一个版本是在 2002 年,它对我不起作用(诚然我只是简单尝试过)java5下.
I've looked at jregex but its last release was in 2002 and it didn't work for me (admittedly I only tried briefly) under java5.
推荐答案
(更新:2011 年 8 月)
正如 geofflane 在 他的回答,Java 7 现在支持命名组.
tchrist 在评论中指出支持是有限的.
他在他的精彩回答Java Regex Helper"中详细说明了局限性强>
As geofflane mentions in his answer, Java 7 now support named groups.
tchrist points out in the comment that the support is limited.
He details the limitations in his great answer "Java Regex Helper"
Java 7 正则表达式命名组支持在 2010 年 9 月Oracle 的博客.
Java 7 regex named group support was presented back in September 2010 in Oracle's blog.
在 Java 7 的正式版本中,支持命名捕获组的构造是:
In the official release of Java 7, the constructs to support the named capturing group are:
(?
定义一个命名组name"capturing text) k
反向引用命名组name"${name}
在 Matcher 的替换字符串中引用捕获的组Matcher.group(String name)
返回给定命名组"捕获的输入子序列.
(?<name>capturing text)
to define a named group "name"k<name>
to backreference a named group "name"${name}
to reference to captured group in Matcher's replacement stringMatcher.group(String name)
to return the captured input subsequence by the given "named group".
<小时>
Java 7 之前的其他替代方案是:
- Google 命名正则表达式(参见 John Hardy 的 答案)
Gábor Lipták 提到(2012 年 11 月)该项目可能不活跃(几个未解决的错误)及其GitHub fork 可以考虑代替. - jregex(参见 Brian Clozel 的答案)
- Google named-regex (see John Hardy's answer)
Gábor Lipták mentions (November 2012) that this project might not be active (with several outstanding bugs), and its GitHub fork could be considered instead. - jregex (See Brian Clozel's answer)
(原始答案:2009 年 1 月,接下来的两个链接现已断开)
(Original answer: Jan 2009, with the next two links now broken)
您不能引用命名组,除非您编写自己的 Regex 版本...
You can not refer to named group, unless you code your own version of Regex...
这正是 Gorbush2 在此线程中所做的.
(有限的实现,正如 tchrist 再次指出的那样,因为它只查找 ASCII 标识符.tchrist 详细说明了限制如:
(limited implementation, as pointed out again by tchrist, as it looks only for ASCII identifiers. tchrist details the limitation as:
只能有一个同名的命名组(您并不总是可以控制!)并且不能将它们用于正则表达式中的递归.
only being able to have one named group per same name (which you don’t always have control over!) and not being able to use them for in-regex recursion.
注意:您可以在 Perl 和 PCRE 正则表达式中找到真正的正则表达式递归示例,如 Regexp Power、PCRE 规范 和 匹配带有平衡括号的字符串 幻灯片)
Note: You can find true regex recursion examples in Perl and PCRE regexes, as mentioned in Regexp Power, PCRE specs and Matching Strings with Balanced Parentheses slide)
示例:
字符串:
"TEST 123"
正则表达式:
"(?<login>\w+) (?<id>\d+)"
访问
matcher.group(1) ==> TEST
matcher.group("login") ==> TEST
matcher.name(1) ==> login
替换
matcher.replaceAll("aaaaa_$1_sssss_$2____") ==> aaaaa_TEST_sssss_123____
matcher.replaceAll("aaaaa_${login}_sssss_${id}____") ==> aaaaa_TEST_sssss_123____
<小时>
(摘自实现)
(extract from the implementation)
public final class Pattern
implements java.io.Serializable
{
[...]
/**
* Parses a group and returns the head node of a set of nodes that process
* the group. Sometimes a double return system is used where the tail is
* returned in root.
*/
private Node group0() {
boolean capturingGroup = false;
Node head = null;
Node tail = null;
int save = flags;
root = null;
int ch = next();
if (ch == '?') {
ch = skip();
switch (ch) {
case '<': // (?<xxx) look behind or group name
ch = read();
int start = cursor;
[...]
// test forGroupName
int startChar = ch;
while(ASCII.isWord(ch) && ch != '>') ch=read();
if(ch == '>'){
// valid group name
int len = cursor-start;
int[] newtemp = new int[2*(len) + 2];
//System.arraycopy(temp, start, newtemp, 0, len);
StringBuilder name = new StringBuilder();
for(int i = start; i< cursor; i++){
name.append((char)temp[i-1]);
}
// create Named group
head = createGroup(false);
((GroupTail)root).name = name.toString();
capturingGroup = true;
tail = root;
head.next = expr(tail);
break;
}
这篇关于Java 中的正则表达式命名组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!