为什么这个正则表达式中的后视表达式没有“明显的最大长度”? [英] Why does the look-behind expression in this regex not have an "obvious maximum length"?

查看:158
本文介绍了为什么这个正则表达式中的后视表达式没有“明显的最大长度”?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个包含一些方括号和其他字符的字符串,我想找到所有右边的方括号,前面是一个开头的方括号和一些字母。
例如,如果字符串是

Given a string containing some number of square brackets and other characters, I want to find all closing square brackets preceded by an opening square bracket and some number of letters. For instance, if the string is


] [abc] [123] abc]

] [abc] [123] abc]

我想只找到第二个结束括号。

I want to find only the second closing bracket.

以下正则表达式


(?< = [az] +)\]

(?<=[a-z]+)\]

会找到第二个结束括号,但也是最后一个:

will find me the second closing bracket, but also the last one:


] [abc ] [123 ] abc ]

] [abc] [123] abc]

由于我只想找到第一个,我明显改变了正则表达式...

Since I want to find only the first one, I make the obvious change to the regex...


(?< = \ [ [az] +)\]

(?<=\[[a-z]+)\]

...我得到Look-behind组在索引11附近没有明显的最大长度。

...and I get "Look-behind group does not have an obvious maximum length near index 11."

\ [只是一个字符,所以看起来明显的最大长度应为1 +,无论明显的最大长度是多少第一个表达式中的后视组。给出了什么?

\[ is only a single character, so it seems like the obvious maximum length should be 1 + whatever the obvious maximum length was of the look-behind group in the first expression. What gives?

ETA:它不是特定于左括号。

ETA: It's not specific to the opening bracket.


(?< = a [bz] +)\]

(?<=a[b-z]+)\]

给了我同样的错误。 (好吧,在索引12处。)

gives me the same error. (Well, at index 12.)

推荐答案


\ [只是一个字符,所以看起来明显的最大长度应该是1 +,无论第一个表达式中后视组的明显最大长度是多少。给出了什么?

\[ is only a single character, so it seems like the obvious maximum length should be 1 + whatever the obvious maximum length was of the look-behind group in the first expression. What gives?

这就是重点,无论第一个表达式中后视组的明显最大长度是多少显而易见。一个规则是你不能在后视中使用 + * 。这不仅适用于Java的正则表达式引擎,而且适用于更多PCRE风格的引擎(甚至是Perl(v5.10)引擎!)。

That's the point, "whatever the obvious maximum length was of the look-behind group in the first expression", is not obvious. A rule of fist is that you can't use + or * inside a look-behind. This is not only so for Java's regex engine, but for many more PCRE-flavored engines (even Perl's (v5.10) engine!).

你可以这样做然而,先行:

You can do this with look-aheads however:

Pattern p = Pattern.compile("(?=(\\[[a-z]+]))");
Matcher m = p.matcher("] [abc] [123] abc]");
while(m.find()) {
  System.out.println("Found a ']' before index: " + m.end(1));
}

(即前方的一个捕获组(!)可用于获取该组的 end(...)

将打印:

Found a ']' before index: 7



编辑



如果你有兴趣更换这样的] ,你可以这样做:

String s = "] [abc] [123] abc] [foo] bar]";
System.out.println(s);
System.out.println(s.replaceAll("(\\[[a-z]+)]", "$1_"));

将打印:

] [abc] [123] abc] [foo] bar]
] [abc_ [123] abc] [foo_ bar]

这篇关于为什么这个正则表达式中的后视表达式没有“明显的最大长度”?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆