Java正则表达式中的新行和美元符号 [英] New line and dollar sign in Java regular expression

查看:285
本文介绍了Java正则表达式中的新行和美元符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道$用于检查 Java 正则表达式中行尾是否存在.

I know $ is used to check if a line end follows in a Java regular expression.

对于以下代码:

String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(\\.[^:/]+).*$?", "$1");
System.out.println(test_domain);

输出为:

http://www.google.com
line2
line3

我假设模式(\\.[^:/]+).*$?与第一行匹配,即http://www.google.com/path,而$1http://www.google.com. ?勉强匹配(因此匹配第一行.)

I assume that the pattern (\\.[^:/]+).*$? matches the first line, which is http://www.google.com/path, and the $1 is http://www.google.com. The ? makes a reluctant match (so matches the first line.)

但是,如果我删除模式中的?并实现以下代码:

However, if I remove the ? in the pattern and implement following codes:

String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);

输出为:

http://www.google.com/path
line2
line3

我认为应该给出结果http://www.google.com

  1. (\\.[^:/]+)匹配http://www.google.com
  2. .*$匹配/path\nline2\nline3
  1. (\\.[^:/]+) matches http://www.google.com
  2. .*$ matches /path\nline2\nline3

我在这里对正则表达式的误解在哪里?

Where is my misunderstanding of the regex here?

推荐答案

您的正则表达式与输入字符串不匹配.实际上,$完全匹配字符串的末尾(在line3的末尾).由于您未使用s标志,因此.无法到达那里.

Your regex does not match the input string.In fact, $ matches exactly the end of string (at the end of line3). Since you are not using an s flag, the . cannot get there.

更多,行/字符串锚点的$末尾不能带有?量词.对于正则表达式引擎没有意义,在Java中被忽略.

More, the $ end of line/string anchor cannot have ? quantifier after it. It makes no sense for the regex engine, and is ignored in Java.

要使其完全起作用,如果只想返回http://www.google.com,则需要使用s标志:

To make it work at all, you need to use s flag if you want to just return http://www.google.com:

String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(?s)(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);

此演示的输出:

http://www.google.com

使用多行(?m)标志,正则表达式将处理每行以查找文字.,然后查找除:/以外的其他字符序列.找到这些字符之一后,该行上的其余字符将被忽略.

With a multiline (?m) flag, the regex will process each line looking for a literal . and then a sequence of characters other than : and /. When one of these characters is found, the rest of characters on that line will be omitted.

    String test_domain = "http://www.google.com/path\nline2\nline3";
    test_domain = test_domain.replaceFirst("(?m)(\\.[^:/]+).*$", "$1");
    System.out.println(test_domain);

此IDEONE演示的输出:

http://www.google.com
line2
line3

这篇关于Java正则表达式中的新行和美元符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆