String.replaceAll(regex)进行两次相同的替换 [英] String.replaceAll(regex) makes the same replacement twice
问题描述
任何人都可以告诉我为什么
Can anyone tell me why
System.out.println("test".replaceAll(".*", "a"));
结果
aa
请注意,以下结果相同:
Note that the following has the same result:
System.out.println("test".replaceAll(".*$", "a"));
我在java 6& 7,两者似乎表现得一样。
我错过了什么或者这是java正则表达式引擎中的错误吗?
I have tested this on java 6 & 7 and both seem to behave the same way. Am I missing something or is this a bug in the java regex engine?
推荐答案
这不是异常现象: 。*
可以匹配任何内容。
This is not an anomaly: .*
can match anything.
您要求替换所有出现次数:
You ask to replace all occurrences:
- 第一次匹配整个字符串,因此正则表达式引擎从下一个匹配的输入结束开始;
- 但是
。*
也匹配一个空字符串!因此,它匹配输入末尾的空字符串,并将其替换为a
。
- the first occurrence does match the whole string, the regex engine therefore starts from the end of input for the next match;
- but
.*
also matches an empty string! It therefore matches an empty string at the end of the input, and replaces it witha
.
使用。+
而不会出现此问题,因为此正则表达式无法匹配空字符串(它需要至少匹配一个字符)。
Using .+
instead will not exhibit this problem since this regex cannot match an empty string (it requires at least one character to match).
或者,使用 .replaceFirst()
仅替换第一次出现:
Or, use .replaceFirst()
to only replace the first occurrence:
"test".replaceFirst(".*", "a")
^^^^^^^^^^^^
现在,为什么。*
的行为与此类似,不会匹配两次以上(理论上可以)是一个值得考虑的有趣事情。见下文:
Now, why .*
behaves like it does and does not match more than twice (it theoretically could) is an interesting thing to consider. See below:
# Before first run
regex: |.*
input: |whatever
# After first run
regex: .*|
input: whatever|
#before second run
regex: |.*
input: whatever|
#after second run: since .* can match an empty string, it it satisfied...
regex: .*|
input: whatever|
# However, this means the regex engine matched an empty input.
# All regex engines, in this situation, will shift
# one character further in the input.
# So, before third run, the situation is:
regex: |.*
input: whatever<|ExhaustionOfInput>
# Nothing can ever match here: out
请注意,@ A.H。注释中的注释,并非所有正则表达式引擎都以这种方式运行。例如,GNU sed
会认为它在第一场比赛后已经耗尽了输入。
Note that, as @A.H. notes in the comments, not all regex engines behave this way. GNU sed
for instance will consider that it has exhausted the input after the first match.
这篇关于String.replaceAll(regex)进行两次相同的替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!