在Java正则表达式中转义特殊字符 [英] Escaping special characters in Java Regular Expressions
问题描述
为了将其用作正则表达式,Java或任何开源库中是否存在用于转义(不引用)特殊字符(元字符)的方法?
Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?
这将非常方便地动态构建正则表达式,而无需手动转义每个单独的字符。
This would be very handy in dynamically building a regular expression, without having to manually escape each individual character.
例如,考虑一个简单的正则表达式,如code> \d + \\\\d + 匹配小数点的数字,如 1.2
,以及以下代码:
For example, consider a simple regex like \d+\.\d+
that matches numbers with a decimal point like 1.2
, as well as the following code:
String digit = "d";
String point = ".";
String regex1 = "\\d+\\.\\d+";
String regex2 = Pattern.quote(digit + "+" + point + digit + "+");
Pattern numbers1 = Pattern.compile(regex1);
Pattern numbers2 = Pattern.compile(regex2);
System.out.println("Regex 1: " + regex1);
if (numbers1.matcher("1.2").matches()) {
System.out.println("\tMatch");
} else {
System.out.println("\tNo match");
}
System.out.println("Regex 2: " + regex2);
if (numbers2.matcher("1.2").matches()) {
System.out.println("\tMatch");
} else {
System.out.println("\tNo match");
}
毫不奇怪,上述代码产生的输出是:
Not surprisingly, the output produced by the above code is:
Regex 1: \d+\.\d+
Match
Regex 2: \Qd+.d+\E
No match
那就是 regex1
匹配 1.2
但 regex2
(这是动态构建的)不(而是匹配文字字符串 d + .d +
)。
That is, regex1
matches 1.2
but regex2
(which is "dynamically" built) does not (instead, it matches the literal string d+.d+
).
所以,是否有一种方法会自动转义每个正则表达式元字符?
So, is there a method that would automatically escape each regex meta-character?
如果有的话,假设在 java中的静态
, escape()
方法.util.regex.Pattern
If there were, let's say, a static escape()
method in java.util.regex.Pattern
, the output of
Pattern.escape('.')
将是字符串\。
,但是
Pattern.escape(',')
应该只是产生,
,因为它不是元字符。同样,
should just produce ","
, since it is not a meta-character. Similarly,
Pattern.escape('d')
可以生成\d
,因为'd'
用于表示数字(尽管在这种情况下转义可能无意义,因为'd'
可能意味着文字'd'
,这不会被正则表达式误读为别的东西,就像的情况一样。'
)。
could produce "\d"
, since 'd'
is used to denote digits (although escaping may not make sense in this case, as 'd'
could mean literal 'd'
, which wouldn't be misunderstood by the regex interpeter to be something else, as would be the case with '.'
).
推荐答案
我不是100%肯定这是你在这里问的。如果您正在寻找一种方法来创建可以在正则表达式中使用的常量,那么只需使用\\来预处理它们即可。
I'm not 100% sure this is what you are asking here. If you are looking for a way to create constants that you can use in your regex patterns then just prepending them with "\\" would work:
String digit = "\\d";
没有我知道的模式
方法为你做这个。不幸的是,虽然有\\d
为数字,\\w
为工作字符等,还有()
用于分组, +
和 *
重复等。没有一种常见的方法来处理一个正则表达式的每个部分。
There is no Pattern
method that I know of that does this for you. Unfortunately, although there is "\\d"
for digits, "\\w"
for work characters, etc. there is also ()
for grouping, +
and *
for repeats, etc.. There is not a common way to deal with each of the parts of a a regular expression.
在你的帖子中,你使用< a href =http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#quote%28java.lang.String%29 =noreferrer> Pattern.quote(string)
方法。你可能知道这个包裹在\\Q
和\\E
所以你可以匹配一个字符串,即使它恰好有一个特殊的正则表达式字符( +
,。
, \\d
等。)
In your post you use the Pattern.quote(string)
method. You probably know that this wraps your pattern between "\\Q"
and "\\E"
so you can match a string even if it happens to have a special regex character in it (+
, .
, \\d
, etc.)
这篇关于在Java正则表达式中转义特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!