转义 Java 正则表达式中的特殊字符 [英] Escaping special characters in Java Regular Expressions

查看:55
本文介绍了转义 Java 正则表达式中的特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Java 或任何开源库中是否有任何方法可以转义(而不是引用)特殊字符(元字符),以便将其用作正则表达式?

Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?

这在动态构建正则表达式时非常方便,无需手动转义每个单独的字符.

This would be very handy in dynamically building a regular expression, without having to manually escape each individual character.

例如,考虑一个简单的正则表达式,如 d+.d+,它匹配带有小数点的数字,如 1.2,以及以下代码:

For example, consider a simple regex like d+.d+ that matches numbers with a decimal point like 1.2, as well as the following code:

String digit = "d";
String point = ".";
String regex1 = "\d+\.\d+";
String regex2 = Pattern.quote(digit + "+" + point + digit + "+");

Pattern numbers1 = Pattern.compile(regex1);
Pattern numbers2 = Pattern.compile(regex2);

System.out.println("Regex 1: " + regex1);

if (numbers1.matcher("1.2").matches()) {
    System.out.println("	Match");
} else {
    System.out.println("	No match");
}

System.out.println("Regex 2: " + regex2);

if (numbers2.matcher("1.2").matches()) {
    System.out.println("	Match");
} else {
    System.out.println("	No match");
}

不出意外,上面代码产生的输出是:

Not surprisingly, the output produced by the above code is:

Regex 1: d+.d+
    Match
Regex 2: Qd+.d+E
    No match

也就是说,regex1 匹配 1.2regex2(动态"构建的)不匹配(相反,它匹配文字字符串d+.d+).

That is, regex1 matches 1.2 but regex2 (which is "dynamically" built) does not (instead, it matches the literal string d+.d+).

那么,有没有一种方法可以自动转义每个正则表达式元字符?

So, is there a method that would automatically escape each regex meta-character?

假设在 java.util.regex.Pattern 中有一个静态的 escape() 方法,

If there were, let's say, a static escape() method in java.util.regex.Pattern, the output of

Pattern.escape('.')

应该是字符串 ".",但是

would be the string ".", but

Pattern.escape(',')

应该只产生 ",",因为它不是元字符.同样,

should just produce ",", since it is not a meta-character. Similarly,

Pattern.escape('d')

可以产生 "d",因为 'd' 用于表示数字(尽管在这种情况下转义可能没有意义,如 'd' 可能意味着文字 'd',它不会被 regex interpeter 误解为其他东西,就像 ' 的情况一样.').

could produce "d", since 'd' is used to denote digits (although escaping may not make sense in this case, as 'd' could mean literal 'd', which wouldn't be misunderstood by the regex interpeter to be something else, as would be the case with '.').

推荐答案

Java 或任何开源库中是否有任何方法可以转义(而不是引用)特殊字符(元字符),以便将其用作正则表达式?

Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?

如果您正在寻找一种方法来创建可以在正则表达式模式中使用的常量,那么只需在它们前面加上 "\" 就可以了,但是没有很好的 Pattern.escape('.') 函数来帮助解决这个问题.

If you are looking for a way to create constants that you can use in your regex patterns, then just prepending them with "\" should work but there is no nice Pattern.escape('.') function to help with this.

因此,如果您尝试匹配 "\d"(字符串 d 而不是十进制字符),那么您可以这样做:

So if you are trying to match "\d" (the string d instead of a decimal character) then you would do:

// this will match on d as opposed to a decimal character
String matchBackslashD = "\\d";
// as opposed to
String matchDecimalDigit = "\d";

Java 字符串中的 4 个斜杠在正则表达式模式中变成了 2 个斜杠.正则表达式模式中的 2 个反斜杠与反斜杠本身匹配.在任何特殊字符前加上反斜杠会将其变成普通字符而不是特殊字符.

The 4 slashes in the Java string turn into 2 slashes in the regex pattern. 2 backslashes in a regex pattern matches the backslash itself. Prepending any special character with backslash turns it into a normal character instead of a special one.

matchPeriod = "\.";
matchPlus = "\+";
matchParens = "\(\)";
... 

在您的帖子中,您使用 Pattern.quote(string) 方法.此方法将您的模式包装在 "\Q""\E" 之间,因此您可以匹配字符串,即使它碰巧有一个特殊的正则表达式字符(+.\d 等)

In your post you use the Pattern.quote(string) method. This method wraps your pattern between "\Q" and "\E" so you can match a string even if it happens to have a special regex character in it (+, ., \d, etc.)

这篇关于转义 Java 正则表达式中的特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆