Java正则表达式转义字符 [英] Java regex escaped characters
问题描述
匹配某些字符(例如换行符)时,可以使用正则表达式\\ n或实际上只使用\ n。例如,以下内容将字符串拆分为一个行数组:
When matching certain characters (such as line feed), you can use the regex "\\n" or indeed just "\n". For example, the following splits a string into an array of lines:
String[] lines = allContent.split("\\r?\\n");
但以下情况也同样如此:
But the following works just as well:
String[] lines = allContent.split("\r?\n");
我的问题:
执行以上操作两个完全以相同的方式工作,还是有任何微妙的区别?如果是后者,你能举例说明你会得到不同的结果吗?
Do the above two work in exactly the same way, or is there any subtle difference? If the latter, can you give an example case where you get different results?
或者只有[可能/理论]表现存在差异?
Or is there a difference only in [possible/theoretical] performance?
推荐答案
目前的情况没有区别。通常的字符串转义序列是在单个反斜杠的帮助下形成的,然后是有效的转义字符(\ n
,\ r
等)和正则表达式转义序列是在文字反斜杠的帮助下形成的(即Java字符串中的双反斜杠) literal)和有效的正则表达式转义字符(\\ n
,\\d
,等等。)。
There is no difference in the current scenario. The usual string escape sequences are formed with the help of a single backslash and then a valid escape char ("\n"
, "\r"
, etc.) and regex escape sequences are formed with the help of a literal backslash (that is, a double backslash in the Java string literal) and a valid regex escape char ("\\n"
, "\\d"
, etc.).
\ n
(一个转义序列)是文字LF(换行符)和\\ n
是与LF符号匹配的正则表达式转义序列。
"\n"
(an escape sequence) is a literal LF (newline) and "\\n"
is a regex escape sequence that matches an LF symbol.
\ r
(一个转义序列)是一个文字CR(回车)和\\\\
是一个与CR符号匹配的正则表达式转义序列。
"\r"
(an escape sequence) is a literal CR (carriage return) and "\\r"
is a regex escape sequence that matches an CR symbol.
\t
(转义序列)是文字标签符号,\\t
是一个匹配标签符号的正则表达式转义序列。
"\t"
(an escape sequence) is a literal tab symbol and "\\t"
is a regex escape sequence that matches a tab symbol.
请参阅正则表达式转义列表的.html#sumrel =nofollow noreferrer> Java正则表达式文档。
See the list in the Java regex docs for the supported list of regex escapes.
但是,如果您使用 Pattern.COMMENTS
标志(用于引入注释并很好地格式化模式,使正则表达式引擎忽略模式中所有未转义的空格),您将需要使用\\ n
或\\\ n
定义换行符(LF) Java字符串文字和<$ c $ c>\\\\或\\\\\
来定义回车(CR)。
However, if you use a Pattern.COMMENTS
flag (used to introduce comments and format a pattern nicely, making the regex engine ignore all unescaped whitespace in the pattern), you will need to either use "\\n"
or "\\\n"
to define a newline (LF) in the Java string literal and "\\r"
or "\\\r"
to define a carriage return (CR).
查看 Java测试:
String s = "\n";
System.out.println(s.replaceAll("\n", "LF")); // => LF
System.out.println(s.replaceAll("\\n", "LF")); // => LF
System.out.println(s.replaceAll("(?x)\\n", "LF")); // => LF
System.out.println(s.replaceAll("(?x)\\\n", "LF")); // => LF
System.out.println(s.replaceAll("(?x)\n", "<LF>"));
// => <LF>
//<LF>
为什么最后一个产生< LF>
+换行符+ < LF>
?因为(?x)\ n
等于,一个空模式,它匹配一个换行前和后面的空格。
Why is the last one producing <LF>
+newline+<LF>
? Because "(?x)\n"
is equal to ""
, an empty pattern, and it matches an empty space before the newline and after it.
这篇关于Java正则表达式转义字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!