Java正则表达式中捕获组的行为混乱 [英] confusion in behavior of capturing groups in java regex
问题描述
在此答案中,我建议使用
s.replaceFirst("\\.0*$|(\\.\\d*?)0+$", "$1");
,但是两个人抱怨结果包含字符串"null",例如, 23.null
.这可以通过 $ 1
(即 group(1)
)为 null
来解释,可以通过 String.valueOf
转换为字符串"null".但是,我总是得到空字符串.我的测试用例对此进行了说明,
but two people complained that the result contained the string "null", e.g., 23.null
. This could be explained by $1
(i.e., group(1)
) being null
, which could be transformed via String.valueOf
to the string "null". However, I always get the empty string. My testcase covers it and
assertEquals("23", removeTrailingZeros("23.00"));
通过.确切的行为是不确定的吗?
passes. Is the exact behavior undefined?
推荐答案
The documentation of Matcher class from the reference implementation doesn't specify the behavior of appendReplacement
method when a capturing group which doesn't capture anything (null
) is specified in the replacement string. While the behavior of group
method is clear, nothing is mentioned in appendReplacement
method.
以下是上述情况在实施上的3个不同之处:
Below are 3 exhibits of difference in implementation for the case above:
- 对于上述情况,参考实现不附加任何内容(或者可以说附加一个空字符串).
- GNU Classpath和Android的实现针对上述情况追加了
null
.
为简洁起见,某些代码已被省略,并以 ...
表示.
Some code has been omitted for the sake of brevity, and is indicated by ...
.
对于参考实现(Sun/Oracle JDK和OpenJDK), appendReplacement
的代码似乎与Java 6相比没有变化,并且当捕获组没有变化时,它不会追加任何内容.不能捕获任何东西:
For the reference implementation (Sun/Oracle JDK and OpenJDK), the code for appendReplacement
doesn't seem to have changed from Java 6, and it will not append anything when a capturing group doesn't capture anything:
} else if (nextChar == '$') {
// Skip past $
cursor++;
// The first number is always a group
int refNum = (int)replacement.charAt(cursor) - '0';
if ((refNum < 0)||(refNum > 9))
throw new IllegalArgumentException(
"Illegal group reference");
cursor++;
// Capture the largest legal group string
...
// Append group
if (start(refNum) != -1 && end(refNum) != -1)
result.append(text, start(refNum), end(refNum));
} else {
参考
- jdk6/98e143b44620
- jdk8/687fd7c7986d
GNU类路径是Java类库的完全重新实现,在上述情况下, appendReplacement
具有不同的实现.在Classpath中,Classpath中的 java.util.regex
包中的类只是 gnu.java.util.regex
中的类的包装.
GNU Classpath, which is a complete reimplementation of Java Class Library has a different implementation for appendReplacement
in the case above. In Classpath, the classes in java.util.regex
package in Classpath is just a wrapper for classes in gnu.java.util.regex
.
Matcher.appendReplacement
调用 RE.getReplacement
来处理匹配部分的替换:
Matcher.appendReplacement
calls RE.getReplacement
to process replacement for the matched portion:
public Matcher appendReplacement (StringBuffer sb, String replacement)
throws IllegalStateException
{
assertMatchOp();
sb.append(input.subSequence(appendPosition,
match.getStartIndex()).toString());
sb.append(RE.getReplacement(replacement, match,
RE.REG_REPLACE_USE_BACKSLASHESCAPE));
appendPosition = match.getEndIndex();
return this;
}
RE.getReplacement
calls REMatch.substituteInto
to get the content of the capturing group and appends its result directly:
case '$':
int i1 = i + 1;
while (i1 < replace.length () &&
Character.isDigit (replace.charAt (i1)))
i1++;
sb.append (m.substituteInto (replace.substring (i, i1)));
i = i1 - 1;
break;
<代码> REMatch.substituteInto 会直接附加 REMatch.toString(int)
的结果,而无需检查捕获组是否捕获了任何东西:
REMatch.substituteInto
appends the result of REMatch.toString(int)
directly without checking whether the capturing group has captured anything:
if ((input.charAt (pos) == '$')
&& (Character.isDigit (input.charAt (pos + 1))))
{
// Omitted code parses the group number into val
...
if (val < start.length)
{
output.append (toString (val));
}
}
和 <当捕获组未捕获(忽略了相关代码)时,code> REMatch.toString(int) 返回 null
.
public String toString (int sub)
{
if ((sub >= start.length) || sub < 0)
throw new IndexOutOfBoundsException ("No group " + sub);
if (start[sub] == -1)
return null;
...
}
因此,在GNU Classpath的情况下,当替换字符串中指定了无法捕获任何内容的捕获组时,会将 null
附加到字符串中.
So in GNU Classpath's case, null
will be appended to the string when a capturing group which fails to capture anything is specified in the replacement string.
在Android中, Matcher.appendReplacement
调用私有方法 appendEvaluated
,该方法又直接附加了 group(int)
的结果到替换字符串.
In Android, Matcher.appendReplacement
calls private method appendEvaluated
, which in turn directly appends the result of group(int)
to the replacement string.
public Matcher appendReplacement(StringBuffer buffer, String replacement) {
buffer.append(input.substring(appendPos, start()));
appendEvaluated(buffer, replacement);
appendPos = end();
return this;
}
private void appendEvaluated(StringBuffer buffer, String s) {
boolean escape = false;
boolean dollar = false;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == '\\' && !escape) {
escape = true;
} else if (c == '$' && !escape) {
dollar = true;
} else if (c >= '0' && c <= '9' && dollar) {
buffer.append(group(c - '0'));
dollar = false;
} else {
buffer.append(c);
dollar = false;
escape = false;
}
}
// This seemingly stupid piece of code reproduces a JDK bug.
if (escape) {
throw new ArrayIndexOutOfBoundsException(s.length());
}
}
由于 Matcher.group(int)
对于无法捕获的捕获组返回 null
,因此 Matcher.appendReplacement
附加 null
在替换字符串中引用捕获组时.
Since Matcher.group(int)
returns null
for capturing group which fails to capture, Matcher.appendReplacement
appends null
when the capturing group is referred to in the replacement string.
这两个抱怨您的人很可能在Android上运行他们的代码.
It is most likely that the 2 people complaining to you are running their code on Android.
这篇关于Java正则表达式中捕获组的行为混乱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!