Java正则表达式中捕获组的行为混乱 [英] confusion in behavior of capturing groups in java regex

查看:51
本文介绍了Java正则表达式中捕获组的行为混乱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在此答案中,我建议使用

s.replaceFirst("\\.0*$|(\\.\\d*?)0+$", "$1");

,但是两个人抱怨结果包含字符串"null",例如, 23.null .这可以通过 $ 1 (即 group(1))为 null 来解释,可以通过 String.valueOf 转换为字符串"null".但是,我总是得到空字符串.我的测试用例对此进行了说明,

but two people complained that the result contained the string "null", e.g., 23.null. This could be explained by $1 (i.e., group(1)) being null, which could be transformed via String.valueOf to the string "null". However, I always get the empty string. My testcase covers it and

assertEquals("23", removeTrailingZeros("23.00"));

通过.确切的行为是不确定的吗?

passes. Is the exact behavior undefined?

推荐答案

The documentation of Matcher class from the reference implementation doesn't specify the behavior of appendReplacement method when a capturing group which doesn't capture anything (null) is specified in the replacement string. While the behavior of group method is clear, nothing is mentioned in appendReplacement method.

以下是上述情况在实施上的3个不同之处:

Below are 3 exhibits of difference in implementation for the case above:

  • 对于上述情况,参考实现不附加任何内容(或者可以说附加一个空字符串).
  • GNU Classpath和Android的实现针对上述情况追加了 null .

为简洁起见,某些代码已被省略,并以 ... 表示.

Some code has been omitted for the sake of brevity, and is indicated by ....

对于参考实现(Sun/Oracle JDK和OpenJDK), appendReplacement 的代码似乎与Java 6相比没有变化,并且当捕获组没有变化时,它不会追加任何内容.不能捕获任何东西:

For the reference implementation (Sun/Oracle JDK and OpenJDK), the code for appendReplacement doesn't seem to have changed from Java 6, and it will not append anything when a capturing group doesn't capture anything:

        } else if (nextChar == '$') {
            // Skip past $
            cursor++;
            // The first number is always a group
            int refNum = (int)replacement.charAt(cursor) - '0';
            if ((refNum < 0)||(refNum > 9))
                throw new IllegalArgumentException(
                    "Illegal group reference");
            cursor++;

            // Capture the largest legal group string
            ...

            // Append group
            if (start(refNum) != -1 && end(refNum) != -1)
                result.append(text, start(refNum), end(refNum));
        } else {

参考

  • jdk6/98e143b44620
  • jdk8/687fd7c7986d
  • GNU类路径是Java类库的完全重新实现,在上述情况下, appendReplacement 具有不同的实现.在Classpath中,Classpath中的 java.util.regex 包中的类只是 gnu.java.util.regex 中的类的包装.

    GNU Classpath, which is a complete reimplementation of Java Class Library has a different implementation for appendReplacement in the case above. In Classpath, the classes in java.util.regex package in Classpath is just a wrapper for classes in gnu.java.util.regex.

    Matcher.appendReplacement 调用 RE.getReplacement 来处理匹配部分的替换:

    Matcher.appendReplacement calls RE.getReplacement to process replacement for the matched portion:

      public Matcher appendReplacement (StringBuffer sb, String replacement)
        throws IllegalStateException
      {
        assertMatchOp();
        sb.append(input.subSequence(appendPosition,
                                    match.getStartIndex()).toString());
        sb.append(RE.getReplacement(replacement, match,
            RE.REG_REPLACE_USE_BACKSLASHESCAPE));
        appendPosition = match.getEndIndex();
        return this;
      }
    

    RE.getReplacement calls REMatch.substituteInto to get the content of the capturing group and appends its result directly:

                      case '$':
                        int i1 = i + 1;
                        while (i1 < replace.length () &&
                               Character.isDigit (replace.charAt (i1)))
                          i1++;
                        sb.append (m.substituteInto (replace.substring (i, i1)));
                        i = i1 - 1;
                        break;
    

    <代码> REMatch.substituteInto 会直接附加 REMatch.toString(int)的结果,而无需检查捕获组是否捕获了任何东西:

    REMatch.substituteInto appends the result of REMatch.toString(int) directly without checking whether the capturing group has captured anything:

            if ((input.charAt (pos) == '$')
                && (Character.isDigit (input.charAt (pos + 1))))
              {
                // Omitted code parses the group number into val
                ...
    
                if (val < start.length)
                  {
                    output.append (toString (val));
                  }
              }
    

    <当捕获组未捕获(忽略了相关代码)时,code> REMatch.toString(int) 返回 null .

      public String toString (int sub)
      {
        if ((sub >= start.length) || sub < 0)
          throw new IndexOutOfBoundsException ("No group " + sub);
        if (start[sub] == -1)
          return null;
        ...
      }
    

    因此,在GNU Classpath的情况下,当替换字符串中指定了无法捕获任何内容的捕获组时,会将 null 附加到字符串中.

    So in GNU Classpath's case, null will be appended to the string when a capturing group which fails to capture anything is specified in the replacement string.

    在Android中, Matcher.appendReplacement 调用私有方法 appendEvaluated ,该方法又直接附加了 group(int)的结果到替换字符串.

    In Android, Matcher.appendReplacement calls private method appendEvaluated, which in turn directly appends the result of group(int) to the replacement string.

    public Matcher appendReplacement(StringBuffer buffer, String replacement) {
        buffer.append(input.substring(appendPos, start()));
        appendEvaluated(buffer, replacement);
        appendPos = end();
        return this;
    }
    
    private void appendEvaluated(StringBuffer buffer, String s) {
        boolean escape = false;
        boolean dollar = false;
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (c == '\\' && !escape) {
                escape = true;
            } else if (c == '$' && !escape) {
                dollar = true;
            } else if (c >= '0' && c <= '9' && dollar) {
                buffer.append(group(c - '0'));
                dollar = false;
            } else {
                buffer.append(c);
                dollar = false;
                escape = false;
            }
        }
        // This seemingly stupid piece of code reproduces a JDK bug.
        if (escape) {
            throw new ArrayIndexOutOfBoundsException(s.length());
        }
    }
    

    由于 Matcher.group(int)对于无法捕获的捕获组返回 null ,因此 Matcher.appendReplacement 附加 null 在替换字符串中引用捕获组时.

    Since Matcher.group(int) returns null for capturing group which fails to capture, Matcher.appendReplacement appends null when the capturing group is referred to in the replacement string.

    这两个抱怨您的人很可能在Android上运行他们的代码.

    It is most likely that the 2 people complaining to you are running their code on Android.

    这篇关于Java正则表达式中捕获组的行为混乱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆