java,正则表达式,需要在正则表达式中转义反斜杠 [英] java, regular expression, need to escape backslash in regex

查看:255
本文介绍了java,正则表达式,需要在正则表达式中转义反斜杠的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

参考下面的问题 -
String.replaceAll单反斜杠带双反斜杠

With reference to below question - String.replaceAll single backslashes with double backslashes

我写了一个测试程序,我发现在两种情况下结果都是正确的,无论我是否逃避反斜杠。这可能是因为
- \t是一个可识别的Java String转义序列。 (尝试\,它会抱怨)。
- \t被视为正则表达式中的文字选项卡。
我有点不确定原因。

I wrote a test program, and I found that the result is true in both cases, whether I escape the backslash or not. This may be because - \t is a recognized Java String escape sequence. (Try \s and it would complain). - \t is taken as literal tab in the regex. I am somewhat unsure of the reasons.

是否有关于在Java中转义正则表达式的一般准则。我认为使用两个反斜杠是正确的方法。

Is there any general guideline about escaping regex in Java. I think using two backslashes is the correct approach.

我仍然想知道你的意见。

I would still like to know your opinions.

public class TestDeleteMe {

  public static void main(String args[]) {
    System.out.println(System.currentTimeMillis());

    String str1 = "a    b"; //tab between a and b 

    //pattern - a and b with any number of spaces or tabs between 
    System.out.println("matches = " + str1.matches("^a[ \\t]*b$")); 
    System.out.println("matches = " + str1.matches("^a[ \t]*b$")); 
  }
}


推荐答案

那里是对转义序列的两种解释:首先是Java编译器,然后是regexp引擎。当Java编译器看到两个斜杠时,它会用一个斜杠替换它们。当斜杠后面有 t 时,Java会用一个选项卡替换它;当双斜杠后面有 t 时,Java就不管它了。但是,因为两个斜杠已被单个斜杠替换,所以regexp引擎会看到 \t ,并将其解释为选项卡。

There are two interpretations of escape sequences going on: first by the Java compiler, and then by the regexp engine. When Java compiler sees two slashes, it replaces them with a single slash. When there is t following a slash, Java replaces it with a tab; when there is a t following a double-slash, Java leaves it alone. However, because two slashes have been replaced by a single slash, regexp engine sees \t, and interprets it as a tab.

我认为让正则表达式将 \t 解释为标签更清晰(即写\\t in Java)因为它允许您在调试,记录等过程中以预期的形式查看表达式。如果使用<$ c转换 Pattern $ c> \t 到字符串,您将在正则表达式的中间看到一个制表符,并可能将其混淆为其他空格。带有 \\t 的模式没有这个问题:它们会显示一个 \t 斜线,告诉你他们匹配的那种空格。

I think that it is cleaner to let the regexp interpret \t as a tab (i.e. write "\\t" in Java) because it lets you see the expression in its intended form during debugging, logging, etc. If you convert Pattern with \t to string, you will see a tab character in the middle of your regular expression, and may confuse it for other whitespace. Patterns with \\t do not have this problem: they will show you a \t with a single slash, telling you exactly the kind of whitespace that they match.

这篇关于java,正则表达式,需要在正则表达式中转义反斜杠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆