为什么在Java 8 split中有时会在结果数组的开头删除空字符串? [英] Why in Java 8 split sometimes removes empty strings at start of result array?

查看:93
本文介绍了为什么在Java 8 split中有时会在结果数组的开头删除空字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Java 8之前当我们拆分空字符串时

Before Java 8 when we split on empty string like

String[] tokens = "abc".split("");

拆分机制会在标有 |

|a|b|c|

因为空格存在之前和在每个角色之后。因此,它最初会产生这个数组

because empty space "" exists before and after each character. So as result it would generate at first this array

["", "a", "b", "c", ""]

以后会删除尾随空字符串(因为我们没有明确地为 limit 参数提供负值)所以它最终会返回

and later will remove trailing empty strings (because we didn't explicitly provide negative value to limit argument) so it will finally return

["", "a", "b", "c"]






在Java 8中拆分机制似乎已经改变。现在当我们使用


In Java 8 split mechanism seems to have changed. Now when we use

"abc".split("")

我们将获得 [a,b,c] 数组而不是 [,a,b,c] 所以看起来开头的空字符串也会被删除。但是这个理论失败了,因为例如

we will get ["a", "b", "c"] array instead of ["", "a", "b", "c"] so it looks like empty strings at start are also removed. But this theory fails because for instance

"abc".split("a")

在start [,bc] 返回带有空字符串的数组。

is returning array with empty string at start ["", "bc"].

有人可以解释这里发生了什么,以及这些案例的拆分规则如何在Java中发生变化8?

Can someone explain what is going on here and how rules of split for this cases have changed in Java 8?

推荐答案

String.split 的行为(调用 Pattern.split )Java 7和Java 8之间的变化。

The behavior of String.split (which calls Pattern.split) changes between Java 7 and Java 8.

比较<$ c $的文档c> Pattern.split Java 7 Java 8 ,我们观察到添加了以下子句:

Comparing between the documentation of Pattern.split in Java 7 and Java 8, we observe the following clause being added:


在输入序列的开头有一个正宽度匹配,然后在结果数组的开头包含一个空的前导子字符串。然而,开头的零宽度匹配从不会产生这样的空前导子串。

When there is a positive-width match at the beginning of the input sequence then an empty leading substring is included at the beginning of the resulting array. A zero-width match at the beginning however never produces such empty leading substring.

同样的子句也被添加到 String.split in Java 8 ,与 Java 7

The same clause is also added to String.split in Java 8, compared to Java 7.

让我们比较Java 7和Java 8中参考实现的 Pattern.split 的代码。从grepcode中检索代码,版本为7u40-b43和8-b132 。

Let us compare the code of Pattern.split of the reference implemetation in Java 7 and Java 8. The code is retrieved from grepcode, for version 7u40-b43 and 8-b132.

public String[] split(CharSequence input, int limit) {
    int index = 0;
    boolean matchLimited = limit > 0;
    ArrayList<String> matchList = new ArrayList<>();
    Matcher m = matcher(input);

    // Add segments before each match found
    while(m.find()) {
        if (!matchLimited || matchList.size() < limit - 1) {
            String match = input.subSequence(index, m.start()).toString();
            matchList.add(match);
            index = m.end();
        } else if (matchList.size() == limit - 1) { // last one
            String match = input.subSequence(index,
                                             input.length()).toString();
            matchList.add(match);
            index = m.end();
        }
    }

    // If no match was found, return this
    if (index == 0)
        return new String[] {input.toString()};

    // Add remaining segment
    if (!matchLimited || matchList.size() < limit)
        matchList.add(input.subSequence(index, input.length()).toString());

    // Construct result
    int resultSize = matchList.size();
    if (limit == 0)
        while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
            resultSize--;
    String[] result = new String[resultSize];
    return matchList.subList(0, resultSize).toArray(result);
}



Java 8



Java 8

public String[] split(CharSequence input, int limit) {
    int index = 0;
    boolean matchLimited = limit > 0;
    ArrayList<String> matchList = new ArrayList<>();
    Matcher m = matcher(input);

    // Add segments before each match found
    while(m.find()) {
        if (!matchLimited || matchList.size() < limit - 1) {
            if (index == 0 && index == m.start() && m.start() == m.end()) {
                // no empty leading substring included for zero-width match
                // at the beginning of the input char sequence.
                continue;
            }
            String match = input.subSequence(index, m.start()).toString();
            matchList.add(match);
            index = m.end();
        } else if (matchList.size() == limit - 1) { // last one
            String match = input.subSequence(index,
                                             input.length()).toString();
            matchList.add(match);
            index = m.end();
        }
    }

    // If no match was found, return this
    if (index == 0)
        return new String[] {input.toString()};

    // Add remaining segment
    if (!matchLimited || matchList.size() < limit)
        matchList.add(input.subSequence(index, input.length()).toString());

    // Construct result
    int resultSize = matchList.size();
    if (limit == 0)
        while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
            resultSize--;
    String[] result = new String[resultSize];
    return matchList.subList(0, resultSize).toArray(result);
}

在Java 8中添加以下代码不包括零长度匹配输入字符串的开头,解释了上述行为。

The addition of the following code in Java 8 excludes the zero-length match at the beginning of the input string, which explains the behavior above.

            if (index == 0 && index == m.start() && m.start() == m.end()) {
                // no empty leading substring included for zero-width match
                // at the beginning of the input char sequence.
                continue;
            }



维持兼容性



Java 8及更高版本中的以下行为



使 split 在各个版本中表现一致,并与Java中的行为兼容8:

Maintaining compatibility

Following behavior in Java 8 and above

To make split behaves consistently across versions and compatible with the behavior in Java 8:


  1. 如果您的正则表达式可以匹配零长度字符串,只需添加( ?!\ A)在正则表达式的结束并将原始正则表达式包装在非捕获组(?:...)(如有必要)。

  2. 如果你的正则表达式不能匹配零长度字符串,你不需要做任何事情。

  3. 如果您不知道正则表达式是否可以匹配零长度字符串,请执行步骤1中的两个操作。

  1. If your regex can match zero-length string, just add (?!\A) at the end of the regex and wrap the original regex in non-capturing group (?:...) (if necessary).
  2. If your regex can't match zero-length string, you don't need to do anything.
  3. If you don't know whether the regex can match zero-length string or not, do both the actions in step 1.

(?!\ A)检查字符串是否未在字符串的开头结束,这意味着匹配是字符串开头的空匹配。

(?!\A) checks that the string does not end at the beginning of the string, which implies that the match is an empty match at the beginning of the string.

没有通用的解决方案使 split 向后兼容Java 7及之前的更换所有 split 的实例都指向您自己的自定义实现。

There is no general solution to make split backward-compatible with Java 7 and prior, short of replacing all instance of split to point to your own custom implementation.

这篇关于为什么在Java 8 split中有时会在结果数组的开头删除空字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆