Java正则表达式中的新行和美元符号 [英] New line and dollar sign in Java regular expression
问题描述
我知道$
用于检查 Java 正则表达式中行尾是否存在.
I know $
is used to check if a line end follows in a Java regular expression.
对于以下代码:
String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(\\.[^:/]+).*$?", "$1");
System.out.println(test_domain);
输出为:
http://www.google.com
line2
line3
我假设模式(\\.[^:/]+).*$?
与第一行匹配,即http://www.google.com/path
,而$1
为http://www.google.com
. ?
勉强匹配(因此匹配第一行.)
I assume that the pattern (\\.[^:/]+).*$?
matches the first line, which is http://www.google.com/path
, and the $1
is http://www.google.com
. The ?
makes a reluctant match (so matches the first line.)
但是,如果我删除模式中的?
并实现以下代码:
However, if I remove the ?
in the pattern and implement following codes:
String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);
输出为:
http://www.google.com/path
line2
line3
我认为应该给出结果http://www.google.com
-
(\\.[^:/]+)
匹配http://www.google.com
-
.*$
匹配/path\nline2\nline3
(\\.[^:/]+)
matcheshttp://www.google.com
.*$
matches/path\nline2\nline3
我在这里对正则表达式的误解在哪里?
Where is my misunderstanding of the regex here?
推荐答案
您的正则表达式与输入字符串不匹配.实际上,$
完全匹配字符串的末尾(在line3
的末尾).由于您未使用s
标志,因此.
无法到达那里.
Your regex does not match the input string.In fact, $
matches exactly the end of string (at the end of line3
). Since you are not using an s
flag, the .
cannot get there.
更多,行/字符串锚点的$
末尾不能带有?
量词.对于正则表达式引擎没有意义,在Java中被忽略.
More, the $
end of line/string anchor cannot have ?
quantifier after it. It makes no sense for the regex engine, and is ignored in Java.
要使其完全起作用,如果只想返回http://www.google.com
,则需要使用s
标志:
To make it work at all, you need to use s
flag if you want to just return http://www.google.com
:
String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(?s)(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);
http://www.google.com
使用多行(?m)
标志,正则表达式将处理每行以查找文字.
,然后查找除:
和/
以外的其他字符序列.找到这些字符之一后,该行上的其余字符将被忽略.
With a multiline (?m)
flag, the regex will process each line looking for a literal .
and then a sequence of characters other than :
and /
. When one of these characters is found, the rest of characters on that line will be omitted.
String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(?m)(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);
http://www.google.com
line2
line3
这篇关于Java正则表达式中的新行和美元符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!