当提供正则表达式时,Java中的String.split()方法究竟是如何工作的? [英] How exactly does String.split() method in Java work when regex is provided?
问题描述
我正在准备OCPJP考试,我遇到了以下示例:
I'm preparing for OCPJP exam and I ran into the following example:
class Test {
public static void main(String args[]) {
String test = "I am preparing for OCPJP";
String[] tokens = test.split("\\S");
System.out.println(tokens.length);
}
}
此代码打印16.我期待像no_of_characters这样的东西+ 1.有人可以解释一下,split()方法在这种情况下实际上做了什么?我只是不明白......
This code prints 16. I was expecting something like no_of_characters + 1. Can someone explain me, what does the split() method actually do in this case? I just don't get it...
推荐答案
它分裂在每个\\ S
在正则表达式引擎中代表 \S
非空白字符。
It splits on every "\\S"
which in regex engine represents \S
non-whitespace character.
因此,我们尝试在非空格上拆分xx
( \ S
)。由于这个正则表达式可以匹配一个字符,让迭代它们来标记拆分的位置(我们将使用管道 |
)。
So lets try to split "x x"
on non-whitespace (\S
). Since this regex can be matched by one character lets iterate over them to mark places of split (we will use pipe |
for that).
- 是
'x'
非空白?是的,所以让我们标记| x
- 是
''
非空白?不,所以我们保持原样 - 是最后一个
'x'
非空白?是的,所以让我们标记| |
- is
'x'
non-whitespace? YES, so lets mark it| x
- is
' '
non-whitespace? NO, so we leave it as is - is last
'x'
non-whitespace? YES, so lets mark it| |
因此我们需要在开始和结束时拆分我们的字符串,这最初会给我们带来结果数组
So as result we need to split our string at start and at end which initially gives us result array
["", " ", ""]
^ ^ - here we split
但由于删除了尾随空字符串,结果将是
But since trailing empty strings are removed, result would be
[""," "] <- result
,""] <- removed trailing empty string
所以split返回数组 [,]
其中只包含两个元素。
so split returns array ["", " "]
which contains only two elements.
BTW。要关闭删除最后一个空字符串,您需要使用 split(正则表达式,限制)
,其值为负值,如 split(\\ S ,-1)
。
BTW. To turn off removing last empty strings you need to use split(regex,limit)
with negative value of limit like split("\\S",-1)
.
现在让我们回到你的例子。如果您的数据是分裂的每一个
Now lets get back to your example. In case of your data you are splitting on each of
I am preparing for OCPJP
| || ||||||||| ||| |||||
这意味着
""|" "|""|" "|""|""|""|""|""|""|""|""|" "|""|""|" "|""|""|""|""|""
所以这代表了这个数组
[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]
但由于尾随空字符串被删除(如果它们的存在是由分裂引起的 - 更多在信息:输出href=\"https://stackoverflow.com/questions/25056607/confusing-output-from-string-split/25058091#25058091\">混淆)
but since trailing empty strings ""
are removed (if their existence was caused by split - more info at: Confusing output from String.split)
[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]
^^ ^^ ^^ ^^ ^^
你得到的结果数组只包含这部分:
you are getting as result array which contains only this part:
[""," ",""," ","","","","","","","",""," ","",""," "]
这是exa 16个元素。
which are exactly 16 elements.
这篇关于当提供正则表达式时,Java中的String.split()方法究竟是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!