当提供正则表达式时,Java 中的 String.split() 方法究竟是如何工作的? [英] How exactly does String.split() method in Java work when regex is provided?

查看:23
本文介绍了当提供正则表达式时,Java 中的 String.split() 方法究竟是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在准备 OCPJP 考试,我遇到了以下示例:

I'm preparing for OCPJP exam and I ran into the following example:

class Test {
   public static void main(String args[]) {
      String test = "I am preparing for OCPJP";
      String[] tokens = test.split("\S");
      System.out.println(tokens.length);
   }
}

此代码打印 16.我期待类似 no_of_characters + 1 的结果.有人可以解释一下,在这种情况下 split() 方法实际上做了什么?我就是不明白...

This code prints 16. I was expecting something like no_of_characters + 1. Can someone explain me, what does the split() method actually do in this case? I just don't get it...

推荐答案

它在每个 "\S" 上拆分,在正则表达式引擎中表示 S 非空白特点.

It splits on every "\S" which in regex engine represents S non-whitespace character.

所以让我们尝试在非空格 (S) 上拆分 "x x".由于此正则表达式可以与一个字符匹配,因此我们可以遍历它们以标记拆分的位置(为此我们将使用管道 |).

So lets try to split "x x" on non-whitespace (S). Since this regex can be matched by one character lets iterate over them to mark places of split (we will use pipe | for that).

  • 'x' 非空格吗?是的,所以让我们标记它 |x
  • ' ' 非空格吗?不,所以我们保持原样
  • 最后一个 'x' 是非空格吗?是的,所以让我们标记它 ||
  • is 'x' non-whitespace? YES, so lets mark it | x
  • is ' ' non-whitespace? NO, so we leave it as is
  • is last 'x' non-whitespace? YES, so lets mark it | |

因此,我们需要在开始和结束时拆分字符串,这最初为我们提供结果数组

So as result we need to split our string at start and at end which initially gives us result array

["", " ", ""]
   ^    ^ - here we split

但由于尾随 空字符串 被删除,结果将是

But since trailing empty strings are removed, result would be

[""," "]     <- result
        ,""] <- removed trailing empty string

so split 返回只包含两个元素的数组 ["", " "].

so split returns array ["", " "] which contains only two elements.

顺便说一句.要关闭删除最后一个空字符串,您需要使用 split(regex,limit) 和负值限制,如 split("\S",-1).

BTW. To turn off removing last empty strings you need to use split(regex,limit) with negative value of limit like split("\S",-1).

现在让我们回到你的例子.如果您的数据正在拆分每个

Now lets get back to your example. In case of your data you are splitting on each of

I am preparing for OCPJP
| || ||||||||| ||| |||||

这意味着

 ""|" "|""|" "|""|""|""|""|""|""|""|""|" "|""|""|" "|""|""|""|""|""

所以这代表这个数组

[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]  

但由于尾随空字符串 "" 被删除(如果它们的存在是由拆分引起的 - 更多信息请访问:来自 String.split 的混淆输出)

but since trailing empty strings "" are removed (if their existence was caused by split - more info at: Confusing output from String.split)

[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]  
                                                     ^^ ^^ ^^ ^^ ^^

你得到的结果数组只包含这部分:

you are getting as result array which contains only this part:

[""," ",""," ","","","","","","","",""," ","",""," "]  

正好是 16 个元素.

which are exactly 16 elements.

这篇关于当提供正则表达式时,Java 中的 String.split() 方法究竟是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆