“未实现可变长度后视"但它不是可变长度 [英] "Variable length lookbehind not implemented" but it isn't variable length

查看:25
本文介绍了“未实现可变长度后视"但它不是可变长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试诊断一个非常疯狂的正则表达式.它也很长,但我已将其缩减为以下脚本.使用 Strawberry Perl v5.26.2 运行.

I have a very crazy regex that I'm trying to diagnose. It is also very long, but I have cut it down to just the following script. Run using Strawberry Perl v5.26.2.

use strict;
use warnings;

my $text = "M Y H A P P Y T E X T";
my $regex = '(?i)(?<!(Mon|Fri|Sun)day |August )abcd(?-i)';

if ($text =~ m/$regex/){
    print "true
";
}
else {
    print "false
";
}

这会导致错误在正则表达式中未实现可变长度后视."

This gives the error "Variable length lookbehind not implemented in regex."

我希望你能帮助解决几个问题:

I am hoping you can help with several issues:

  1. 我不明白为什么会发生这个错误,因为所有可能的后视值都是 7 个字符:Monday"、Friday"、Sunday"、August".
  2. 这个正则表达式不是我自己写的,我不知道如何解释语法 (?i)(?-i).当我摆脱 (?i) 时,错误实际上消失了.perl 将如何解释正则表达式的这一部分?我认为前两个字符被评估为可选文字括号",除了括号没有被转义,而且在这种情况下,我会得到不同的语法错误,因为右括号将不匹配.
  3. 这种行为开始于 Perl 5.16.3_64 和 5.26.1_64 之间,至少在 Strawberry Perl 中是这样.前一个版本的代码很好,后者不是.为什么开始?
  1. I don't see why this error would occur, because all of the possible lookbehind values are 7 characters: "Monday ", "Friday ", "Sunday ", "August ".
  2. I did not write this regex myself, and I am not sure how to interpret the syntax (?i) and (?-i). When I get rid of the (?i) the error actually goes away. How will perl interpret this part of the regex? I would think the first two characters are evaluated to "optional literal parentheses" except that the parentheses isn't escaped and also in that case I would get a different syntax error because the closing parentheses would then not be matched.
  3. This behavior starts somewhere between Perl 5.16.3_64 and 5.26.1_64, at least in Strawberry Perl. The former version is fine with the code, the latter is not. Why did it start?

推荐答案

我已将您的问题简化为:

I have reduced your problem to this:

my $text = 'M Y H A P P Y T E X T';
my $regex = '(?<!st)A';
print ($text =~ m/$regex/i ? "true
" : "false
");

由于存在 /i(不区分大小写)修饰符和某些字符组合,例如 "ss""st"可以用 Typographic_ligature 替换,使其成为可变长度(/August/i 匹配例如 AUGUST(6 个字符)和 august(5 个字符,最后一个是 U+FB06)).

Due to presence of /i (case insensitive) modifier and presence of certain character combinations such as "ss" or "st" that can be replaced by a Typographic_ligature causing it to be a variable length (/August/i matches for instance on both AUGUST (6 characters) and august (5 characters, the last one being U+FB06)).

但是,如果我们删除 /i(不区分大小写)修饰符,那么它会起作用,因为印刷连字不匹配.

However if we remove /i (case insensitive) modifier then it works because typographic ligatures are not matched.

解决方案:使用aa修饰符,即:

/(?<!st)A/iaa

或者在你的正则表达式中:

Or in your regex:

my $text = 'M Y H A P P Y T E X T';
my $regex = '(?<!(Mon|Fri|Sun)day |August )abcd';
print ($text =~ m/$regex/iaa ? "true
" : "false
");

来自 perlre:

要禁止 ASCII/非 ASCII 匹配(例如k"和N{KELVIN SIGN}"),请指定两次a",例如 /aai/爱.(第一次出现a"限制d等,第二次出现增加/i"限制.)但是,请注意ASCII范围之外的代码点将使用Unicode规则用于 /i 匹配,因此修饰符并没有真正将事物限制为 ASCII;它只是禁止 ASCII 和非 ASCII 的混合.

To forbid ASCII/non-ASCII matches (like "k" with "N{KELVIN SIGN}"), specify the "a" twice, for example /aai or /aia. (The first occurrence of "a" restricts the d, etc., and the second occurrence adds the "/i" restrictions.) But, note that code points outside the ASCII range will use Unicode rules for /i matching, so the modifier doesn't really restrict things to just ASCII; it just forbids the intermixing of ASCII and non-ASCII.

在此处查看密切相关的讨论

这篇关于“未实现可变长度后视"但它不是可变长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆