String.replaceAll(regex) 进行两次相同的替换 [英] String.replaceAll(regex) makes the same replacement twice
问题描述
谁能告诉我为什么
System.out.println("test".replaceAll(".*", "a"));
结果
aa
注意以下结果相同:
System.out.println("test".replaceAll(".*$", "a"));
我已经在 java 6 & 上测试过了7 并且两者的行为方式似乎相同.我是否遗漏了什么,或者这是 Java 正则表达式引擎中的错误?
I have tested this on java 6 & 7 and both seem to behave the same way. Am I missing something or is this a bug in the java regex engine?
推荐答案
这不是异常:.*
可以匹配任何内容.
This is not an anomaly: .*
can match anything.
您要求替换所有出现的内容:
You ask to replace all occurrences:
- 第一次出现匹配整个字符串,因此正则表达式引擎从下一次匹配的输入末尾开始;
- 但是
.*
也匹配一个空字符串!因此,它匹配输入末尾的空字符串,并将其替换为a
.
- the first occurrence does match the whole string, the regex engine therefore starts from the end of input for the next match;
- but
.*
also matches an empty string! It therefore matches an empty string at the end of the input, and replaces it witha
.
改用 .+
不会出现这个问题,因为这个正则表达式不能匹配空字符串(它至少需要一个字符才能匹配).
Using .+
instead will not exhibit this problem since this regex cannot match an empty string (it requires at least one character to match).
或者,使用 .replaceFirst()
只替换第一次出现:
Or, use .replaceFirst()
to only replace the first occurrence:
"test".replaceFirst(".*", "a")
^^^^^^^^^^^^
现在,为什么 .*
表现得像它一样并且 不匹配两次以上(理论上可以)是一个值得考虑的有趣的事情.见下文:
Now, why .*
behaves like it does and does not match more than twice (it theoretically could) is an interesting thing to consider. See below:
# Before first run
regex: |.*
input: |whatever
# After first run
regex: .*|
input: whatever|
#before second run
regex: |.*
input: whatever|
#after second run: since .* can match an empty string, it it satisfied...
regex: .*|
input: whatever|
# However, this means the regex engine matched an empty input.
# All regex engines, in this situation, will shift
# one character further in the input.
# So, before third run, the situation is:
regex: |.*
input: whatever<|ExhaustionOfInput>
# Nothing can ever match here: out
请注意,作为@A.H.注释中的注释,并非所有正则表达式引擎都以这种方式运行.例如,GNU sed
会认为它在第一次匹配后已经用尽了输入.
Note that, as @A.H. notes in the comments, not all regex engines behave this way. GNU sed
for instance will consider that it has exhausted the input after the first match.
这篇关于String.replaceAll(regex) 进行两次相同的替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!