在Java正则表达式中转义特殊字符 [英] Escaping special characters in Java Regular Expressions

查看:130
本文介绍了在Java正则表达式中转义特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了将其用作正则表达式,Java或任何开源库中是否存在用于转义(不引用)特殊字符(元字符)的方法?

Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?

这将非常方便地动态构建正则表达式,而无需手动转义每个单独的字符。

This would be very handy in dynamically building a regular expression, without having to manually escape each individual character.

例如,考虑一个简单的正则表达式,如code> \d + \\\\d + 匹配小数点的数字,如 1.2 ,以及以下代码:

For example, consider a simple regex like \d+\.\d+ that matches numbers with a decimal point like 1.2, as well as the following code:

String digit = "d";
String point = ".";
String regex1 = "\\d+\\.\\d+";
String regex2 = Pattern.quote(digit + "+" + point + digit + "+");

Pattern numbers1 = Pattern.compile(regex1);
Pattern numbers2 = Pattern.compile(regex2);

System.out.println("Regex 1: " + regex1);

if (numbers1.matcher("1.2").matches()) {
    System.out.println("\tMatch");
} else {
    System.out.println("\tNo match");
}

System.out.println("Regex 2: " + regex2);

if (numbers2.matcher("1.2").matches()) {
    System.out.println("\tMatch");
} else {
    System.out.println("\tNo match");
}

毫不奇怪,上述代码产生的输出是:

Not surprisingly, the output produced by the above code is:

Regex 1: \d+\.\d+
    Match
Regex 2: \Qd+.d+\E
    No match

那就是 regex1 匹配 1.2 regex2 (这是动态构建的)不(而是匹配文字字符串 d + .d + )。

That is, regex1 matches 1.2 but regex2 (which is "dynamically" built) does not (instead, it matches the literal string d+.d+).

所以,是否有一种方法会自动转义每个正则表达式元字符?

So, is there a method that would automatically escape each regex meta-character?

如果有的话,假设在 java中的静态 escape()方法.util.regex.Pattern

If there were, let's say, a static escape() method in java.util.regex.Pattern, the output of

Pattern.escape('.')

将是字符串\。 ,但是

Pattern.escape(',')

应该只是产生,因为它不是元字符。同样,

should just produce ",", since it is not a meta-character. Similarly,

Pattern.escape('d')

可以生成\d,因为'd'用于表示数字(尽管在这种情况下转义可能无意义,因为'd'可能意味着文字'd',这不会被正则表达式误读为别的东西,就像的情况一样。')。

could produce "\d", since 'd' is used to denote digits (although escaping may not make sense in this case, as 'd' could mean literal 'd', which wouldn't be misunderstood by the regex interpeter to be something else, as would be the case with '.').

推荐答案

我不是100%肯定这是你在这里问的。如果您正在寻找一种方法来创建可以在正则表达式中使用的常量,那么只需使用\\来预处理它们即可。

I'm not 100% sure this is what you are asking here. If you are looking for a way to create constants that you can use in your regex patterns then just prepending them with "\\" would work:

String digit = "\\d";

没有我知道的模式方法为你做这个。不幸的是,虽然有\\d为数字,\\w为工作字符等,还有()用于分组, + * 重复等。没有一种常见的方法来处理一个正则表达式的每个部分。

There is no Pattern method that I know of that does this for you. Unfortunately, although there is "\\d" for digits, "\\w" for work characters, etc. there is also () for grouping, + and * for repeats, etc.. There is not a common way to deal with each of the parts of a a regular expression.

在你的帖子中,你使用< a href =http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#quote%28java.lang.String%29 =noreferrer> Pattern.quote(string)方法。你可能知道这个包裹在\\Q\\E所以你可以匹配一个字符串,即使它恰好有一个特殊的正则表达式字符( + \\d 等。)

In your post you use the Pattern.quote(string) method. You probably know that this wraps your pattern between "\\Q" and "\\E" so you can match a string even if it happens to have a special regex character in it (+, ., \\d, etc.)

这篇关于在Java正则表达式中转义特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆