String.replaceAll(regex)进行两次相同的替换 [英] String.replaceAll(regex) makes the same replacement twice

查看:195
本文介绍了String.replaceAll(regex)进行两次相同的替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都可以告诉我为什么

Can anyone tell me why

System.out.println("test".replaceAll(".*", "a"));

结果

aa

请注意,以下结果相同:

Note that the following has the same result:

System.out.println("test".replaceAll(".*$", "a"));

我在java 6& 7,两者似乎表现得一样。
我错过了什么或者这是java正则表达式引擎中的错误吗?

I have tested this on java 6 & 7 and both seem to behave the same way. Am I missing something or is this a bug in the java regex engine?

推荐答案

这不是异常现象: 。* 可以匹配任何内容。

This is not an anomaly: .* can match anything.

您要求替换所有出现次数:

You ask to replace all occurrences:


  • 第一次匹配整个字符串,因此正则表达式引擎从下一个匹配的输入结束开始;

  • 但是。* 也匹配一个空字符串!因此,它匹配输入末尾的空字符串,并将其替换为 a

  • the first occurrence does match the whole string, the regex engine therefore starts from the end of input for the next match;
  • but .* also matches an empty string! It therefore matches an empty string at the end of the input, and replaces it with a.

使用。+ 而不会出现此问题,因为此正则表达式无法匹配空字符串(它需要至少匹配一个字符)。

Using .+ instead will not exhibit this problem since this regex cannot match an empty string (it requires at least one character to match).

或者,使用 .replaceFirst()仅替换第一次出现:

Or, use .replaceFirst() to only replace the first occurrence:

"test".replaceFirst(".*", "a")
       ^^^^^^^^^^^^

现在,为什么。* 的行为与此类似,不会匹配两次以上(理论上可以)是一个值得考虑的有趣事情。见下文:

Now, why .* behaves like it does and does not match more than twice (it theoretically could) is an interesting thing to consider. See below:

# Before first run
regex: |.*
input: |whatever
# After first run
regex: .*|
input: whatever|
#before second run
regex: |.*
input: whatever|
#after second run: since .* can match an empty string, it it satisfied...
regex: .*|
input: whatever|
# However, this means the regex engine matched an empty input.
# All regex engines, in this situation, will shift
# one character further in the input.
# So, before third run, the situation is:
regex: |.*
input: whatever<|ExhaustionOfInput>
# Nothing can ever match here: out

请注意,@ A.H。注释中的注释,并非所有正则表达式引擎都以这种方式运行。例如,GNU sed 会认为它在第一场比赛后已经耗尽了输入。

Note that, as @A.H. notes in the comments, not all regex engines behave this way. GNU sed for instance will consider that it has exhausted the input after the first match.

这篇关于String.replaceAll(regex)进行两次相同的替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆