Java中的正则表达式反向引用 [英] Regex backreferences in Java

查看:76
本文介绍了Java中的正则表达式反向引用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须匹配一个数字,然后匹配14次.然后,我在

I had to match a number followed by itself 14 times. Then I've came to the following regular expression in the regexstor.net/tester:

(\d)\1{14}

修改

当我将其粘贴到代码中时,包括正确的反斜杠:

When I paste it in my code, including the backslashes properly:

"(\\d)\\1{14}"

我已将后向引用"\ 1" 替换为"$ 1" ,该代码用于替换Java中的匹配项.

I've replaced the back-reference "\1" by the "$1" which is used to replace matches in Java.

然后我意识到它不起作用.当需要在REGEX中向后引用匹配时,在Java中,必须使用"\ N" ,但是当要替换它时,运算符为"$ N".

Then I've realized that it doesn't work. When you need to back-reference a match in the REGEX, in Java, you have to use "\N", but when you want to replace it, the operator is "$N".

我的问题是:为什么?

推荐答案

$ 1 在Java的正则表达式中不是反向引用,也不是我能想到的任何其他形式.当您替换某物时,仅使用 $ 1 :

$1 is not a back reference in Java's regexes, nor in any other flavor I can think of. You only use $1 when you are replacing something:

String input="A12.3 bla bla my input";
input = StringUtils.replacePattern(
            input, "^([A-Z]\\d{2}\\.\\d).*$", "$1");
//                                            ^^^^

关于反向引用是什么有一些误导,包括我从以下位置获得该摘录的位置:带有反向引用的简单Java regex无法正常工作.

There is some misinformation about what a back reference is, including the very place I got that snippet from: simple java regex with backreference does not work.

Java在其他已有的版本中将其regex语法建模,其中 $ 已经是一个元字符.它锚定在字符串的末尾(或在多行模式下为行).

Java modeled its regex syntax after other existing flavors where the $ was already a meta character. It anchors to the end of the string (or line in multi-line mode).

类似地,Java使用 \ 1 作为反向引用.因为正则表达式是字符串,所以必须将其转义: \\ 1 .

Similarly, Java uses \1 for back references. Because regexes are strings, it must be escaped: \\1.

从词法/句法的观点来看,确实可以明确使用 $ 1 (作为奖励,使用反向引用时,可以避免使用邪恶转义的转义符").

From a lexical/syntactic standpoint it is true that $1 could be used unambiguously (as a bonus it would prevent the need for the "evil escaped escape" when using back references).

要匹配行尾之后的 1 ,则正则表达式必须为 $ \ n1 :

To match a 1 that comes after the end of a line the regex would need to be $\n1:

this line
1

使用熟悉的语法而不是更改规则(这大部分来自Perl)更有意义.

It just makes more sense to use a familiar syntax instead of changing the rules, most of which came from Perl.

Perl的第一个版本出现在 1987 中,它比Java早得多,它在 1995 中以Beta版本发布.

The first version of Perl came out in 1987, which is much earlier than Java, which was released in beta in 1995.

我挖出了

也可以使用包围结构(\ ... \),在这种情况下 \< digit> digit 相匹配'子字符串.(在模式之外,请始终在数字前使用 $ 而不是 \ . $< digit> 的范围(和 $ \` $& $')扩展到封闭的BLOCK或eval字符串的末尾,或扩展到下一个与 \< digit> 表示法有时可以在当前模式之外使用,但不应依赖.)您可以根据需要添加任意多个括号.如果子字符串多于9个,则变量 $ 10 $ 11 ,...引用相应的子字符串.在模式中,如果 \ 10 \ 11 等,则在反向引用之前至少有很多左括号的情况下,请返回子字符串.否则(出于向后兼容性考虑) \ 10 \ 010 相同,并带有退格键,而 \ 11 \ 011相同.,一个标签.等等.( \ 1 \ 9 始终是反向引用.)

The bracketing construct (\ ...\ ) may also be used, in which case \<digit> matches the digit'th substring. (Outside of the pattern, always use $ instead of \ in front of the digit. The scope of $<digit> (and $\`, $& and $') extends to the end of the enclosing BLOCK or eval string, or to the next pattern match with subexpressions. The \<digit> notation sometimes works outside the current pattern, but should not be relied upon.) You may have as many parentheses as you wish. If you have more than 9 substrings, the variables $10, $11, ... refer to the corresponding substring. Within the pattern, \10, \11, etc. refer back to substrings if there have been at least that many left parens before the backreference. Otherwise (for backward compatibilty) \10 is the same as \010, a backspace, and \11 the same as \011, a tab. And so on. (\1 through \9 are always backreferences.)

这篇关于Java中的正则表达式反向引用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆