String.replaceAll(regex) 进行两次相同的替换 [英] String.replaceAll(regex) makes the same replacement twice

查看:27
本文介绍了String.replaceAll(regex) 进行两次相同的替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能告诉我为什么

System.out.println("test".replaceAll(".*", "a"));

结果

aa

注意以下结果相同:

System.out.println("test".replaceAll(".*$", "a"));

我已经在 java 6 & 上测试过了7 并且两者的行为方式似乎相同.我是否遗漏了什么,或者这是 Java 正则表达式引擎中的错误?

I have tested this on java 6 & 7 and both seem to behave the same way. Am I missing something or is this a bug in the java regex engine?

推荐答案

这不是异常:.* 可以匹配任何内容.

This is not an anomaly: .* can match anything.

您要求替换所有出现的内容:

You ask to replace all occurrences:

  • 第一次出现匹配整个字符串,因此正则表达式引擎从下一次匹配的输入末尾开始;
  • 但是 .* 也匹配一个空字符串!因此,它匹配输入末尾的空字符串,并将其替换为 a.
  • the first occurrence does match the whole string, the regex engine therefore starts from the end of input for the next match;
  • but .* also matches an empty string! It therefore matches an empty string at the end of the input, and replaces it with a.

改用 .+ 不会出现这个问题,因为这个正则表达式不能匹配空字符串(它至少需要一个字符才能匹配).

Using .+ instead will not exhibit this problem since this regex cannot match an empty string (it requires at least one character to match).

或者,使用 .replaceFirst() 只替换第一次出现:

Or, use .replaceFirst() to only replace the first occurrence:

"test".replaceFirst(".*", "a")
       ^^^^^^^^^^^^

现在,为什么 .* 表现得像它一样并且 不匹配两次以上(理论上可以)是一个值得考虑的有趣的事情.见下文:

Now, why .* behaves like it does and does not match more than twice (it theoretically could) is an interesting thing to consider. See below:

# Before first run
regex: |.*
input: |whatever
# After first run
regex: .*|
input: whatever|
#before second run
regex: |.*
input: whatever|
#after second run: since .* can match an empty string, it it satisfied...
regex: .*|
input: whatever|
# However, this means the regex engine matched an empty input.
# All regex engines, in this situation, will shift
# one character further in the input.
# So, before third run, the situation is:
regex: |.*
input: whatever<|ExhaustionOfInput>
# Nothing can ever match here: out

请注意,作为@A.H.注释中的注释,并非所有正则表达式引擎都以这种方式运行.例如,GNU sed 会认为它在第一次匹配后已经用尽了输入.

Note that, as @A.H. notes in the comments, not all regex engines behave this way. GNU sed for instance will consider that it has exhausted the input after the first match.

这篇关于String.replaceAll(regex) 进行两次相同的替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆