用于(等同于)“字符类内的反向引用"的通用方法? [英] General approach for (equivalent of) "backreferences within character class"?

查看:115
本文介绍了用于(等同于)“字符类内的反向引用"的通用方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Perl正则表达式中,像\1\2等这样的表达式通常被解释为先前捕获的组的反向引用",但是当\1\2等出现在字符中时则不是这样.班级.在后一种情况下,\被视为转义字符(因此\1只是1等).

In Perl regexes, expressions like \1, \2, etc. are usually interpreted as "backreferences" to previously captured groups, but not so when the \1, \2, etc. appear within a character class. In the latter case, the \ is treated as an escape character (and therefore \1 is just 1, etc.).

因此,如果(例如)一个人想要匹配一个字符串(长度大于1),该字符串的第一个字符与最后一个字符匹配,但没有出现在字符串的其他位置,则以下正则表达式将 not 做:

Therefore, if (for example) one wanted to match a string (of length greater than 1) whose first character matches its last character, but does not appear anywhere else in the string, the following regex will not do:

/\A       # match beginning of string;
 (.)      # match and capture first character (referred to subsequently by \1);
 [^\1]*   # (WRONG) match zero or more characters different from character in \1;
 \1       # match \1;
 \z       # match the end of the string;
/sx       # s: let . match newline; x: ignore whitespace, allow comments

不会起作用,因为它匹配(例如)字符串'a1a2a':

would not work, since it matches (for example) the string 'a1a2a':

  DB<1> ( 'a1a2a' =~ /\A(.)[^\1]*\1\z/ and print "fail!" ) or print "success!"
fail!

我通常可以找到一些解决方法 1 ,但是它总是很具体地针对问题,并且通常看起来比如果我可以在字符类中使用反向引用要复杂得多.

I can usually manage to find some workaround1, but it's always rather problem-specific, and usually far more complicated-looking than what I would do if I could use backreferences within a character class.

是否有一个通用(并且希望是简单明了的)解决方法?

Is there a general (and hopefully straightforward) workaround?

1 例如,对于上面示例中的问题,我将使用

1 For example, for the problem in the example above, I'd use something like

/\A
 (.)              # match and capture first character (referred to subsequently
                  # by \1);
 (?!.*\1\.+\z)    # a negative lookahead assertion for "a suffix containing \1";
 .*               # substring not containing \1 (as guaranteed by the preceding
                  # negative lookahead assertion);
 \1\z             # match last character only if it is equal to the first one
/sx

...在这里,我用更令人讨厌的否定超前断言(?!.*\1.+\z)代替了较早的正则表达式中相当简单明了(虽然很遗憾,不正确)的子表达式[^\1]*.该断言基本上说放弃\1出现在该点之外的任何地方(而不是最后一个位置)".顺便说一句,我给出这个解决方案只是为了说明我在问题中提到的解决方法.我并不是说这是一个特别好的.

...where I've replaced the reasonably straightforward (though, alas, incorrect) subexpression [^\1]* in the earlier regex with the somewhat more forbidding negative lookahead assertion (?!.*\1.+\z). This assertion basically says "give up if \1 appears anywhere beyond this point (other than at the last position)." Incidentally, I give this solution just to illustrate the sort of workarounds I referred to in the question. I don't claim that it is a particularly good one.

推荐答案

这可以通过在重复的组中使用否定超前来完成:

This can be accomplished with a negative lookahead within a repeated group:

/\A         # match beginning of string;
 (.)        # match and capture first character (referred to subsequently by \1);
 ((?!\1).)* # match zero or more characters different from character in \1;
 \1         # match \1;
 \z         # match the end of the string;
/sx

即使该组包含多个字符,也可以使用此模式.

This pattern can be used even if the group contains more than one character.

这篇关于用于(等同于)“字符类内的反向引用"的通用方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆