背后的反向引用 [英] Backreferences in lookbehind

查看:301
本文介绍了背后的反向引用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你可以在后视中使用反向引用吗?

Can you use backreferences in a lookbehind?

假设我想要拆分我身后的任何一个角色重复两次。

Let's say I want to split wherever behind me a character is repeated twice.

    String REGEX1 = "(?<=(.)\\1)"; // DOESN'T WORK!
    String REGEX2 = "(?<=(?=(.)\\1)..)"; // WORKS!

    System.out.println(java.util.Arrays.toString(
        "Bazooka killed the poor aardvark (yummy!)"
        .split(REGEX2)
    )); // prints "[Bazoo, ka kill, ed the poo, r aa, rdvark (yumm, y!)]"

使用 REGEX2 (其中反向引用嵌套在一个lookbehind内),但 REGEX1 在运行时给出此错误:

Using REGEX2 (where the backreference is in a lookahead nested inside a lookbehind) works, but REGEX1 gives this error at run-time:

Look-behind group does not have an obvious maximum length near index 8
(?<=(.)\1)
        ^

我认为,有点有意义,因为通常后向引用可以捕获任意长度的字符串(如果正则表达式编译器更聪明一点,它可以确定 \1 在这种情况下是(。),因此长度有限。)

This sort of make sense, I suppose, because in general the backreference can capture a string of any length (if the regex compiler is a bit smarter, though, it could determine that \1 is (.) in this case, and therefore has a finite length).

那么有没有办法在后视镜中使用反向引用?

So is there a way to use a backreference in a lookbehind?

如果没有,你是否可以使用这种嵌套的前瞻来解决它?是否有其他常用技术?

And if there isn't, can you always work around it using this nested lookahead? Are there other commonly-used techniques?

推荐答案

看起来你的怀疑是正确的,反向引用通常不能用于Java lookbehinds 。你提出的解决方法使得lookbehind的有限长度显而易见,对我来说看起来非常聪明。

Looks like your suspicion is correct that backreferences generally can't be used in Java lookbehinds. The workaround you proposed makes the finite length of the lookbehind explicit and looks very clever to me.

我很想知道Python对这个正则表达式的作用。 Python只支持固定长度的lookbehind,而不是像Java这样的有限长度,但这个正则表达式是固定长度的。我无法直接使用 re.split(),因为Python的 re.split()永远不会分裂为空匹配,但我想我在 re.sub()中发现了一个错误:

I was intrigued to find out what Python does with this regex. Python only supports fixed-length lookbehind, not finite-length like Java, but this regex is fixed length. I couldn't use re.split() directly because Python's re.split() never splits on an empty match, but I think I found a bug in re.sub():

>>> r=re.compile("(?<=(.)\\1)")
>>> a=re.sub(r,"|", "Bazooka killed the poor aardvark (yummy!)")
>>> a
'Bazo|oka kil|led the po|or a|ardvark (yum|my!)'

lookbehind匹配两个重复字符之间的

The lookbehind matches between the two duplicate characters!

这篇关于背后的反向引用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆