“未实现可变长度向后看".但这不是可变长度 [英] "Variable length lookbehind not implemented" but it isn't variable length
问题描述
我正在尝试诊断一个非常疯狂的正则表达式.它也很长,但是我将其简化为以下脚本.使用Strawberry Perl v5.26.2运行.
I have a very crazy regex that I'm trying to diagnose. It is also very long, but I have cut it down to just the following script. Run using Strawberry Perl v5.26.2.
use strict;
use warnings;
my $text = "M Y H A P P Y T E X T";
my $regex = '(?i)(?<!(Mon|Fri|Sun)day |August )abcd(?-i)';
if ($text =~ m/$regex/){
print "true\n";
}
else {
print "false\n";
}
这会产生错误正则表达式中未实现可变长度后视".
This gives the error "Variable length lookbehind not implemented in regex."
我希望您可以解决一些问题:
I am hoping you can help with several issues:
- 我不知道为什么会发生此错误,因为所有可能的lookbehind值都是7个字符:星期一",星期五",星期日",八月".
- 我自己没有编写此正则表达式,并且不确定如何解释语法
(?i)
和(?-i)
.当我摆脱(?i)
时,错误实际上消失了. Perl将如何解释正则表达式的这一部分?我认为前两个字符被评估为可选的文字括号",除了不对括号进行转义,而且在这种情况下,我会遇到另一种语法错误,因为结束括号将不匹配. - 此行为至少在草莓Perl中开始于Perl 5.16.3_64和5.26.1_64之间.前一个版本可以使用代码,而后者则不行.为什么开始?
- I don't see why this error would occur, because all of the possible lookbehind values are 7 characters: "Monday ", "Friday ", "Sunday ", "August ".
- I did not write this regex myself, and I am not sure how to interpret the syntax
(?i)
and(?-i)
. When I get rid of the(?i)
the error actually goes away. How will perl interpret this part of the regex? I would think the first two characters are evaluated to "optional literal parentheses" except that the parentheses isn't escaped and also in that case I would get a different syntax error because the closing parentheses would then not be matched. - This behavior starts somewhere between Perl 5.16.3_64 and 5.26.1_64, at least in Strawberry Perl. The former version is fine with the code, the latter is not. Why did it start?
推荐答案
我已将您的问题归结为:
I have reduced your problem to this:
my $text = 'M Y H A P P Y T E X T';
my $regex = '(?<!st)A';
print ($text =~ m/$regex/i ? "true\n" : "false\n");
由于存在/i
(不区分大小写)修饰符和某些字符组合(例如"ss"
或"st"
),可以用
Due to presence of /i
(case insensitive) modifier and presence of certain character combinations such as "ss"
or "st"
that can be replaced by a Typographic_ligature causing it to be a variable length (/August/i
matches for instance on both AUGUST
(6 characters) and august
(5 characters, the last one being U+FB06)).
但是,如果我们删除/i
(不区分大小写)修饰符,则它会起作用,因为印刷连字不匹配.
However if we remove /i
(case insensitive) modifier then it works because typographic ligatures are not matched.
解决方案:使用aa
修饰符,即:
/(?<!st)A/iaa
或者在您的正则表达式中:
Or in your regex:
my $text = 'M Y H A P P Y T E X T';
my $regex = '(?<!(Mon|Fri|Sun)day |August )abcd';
print ($text =~ m/$regex/iaa ? "true\n" : "false\n");
来自 perlre :
要禁止ASCII/非ASCII匹配(如带有"\ N {KELVIN SIGN}"的"k"),请指定两次"a",例如
/aai
或/aia
. (第一个出现的"a"会限制\d
等,第二个出现会添加"/i"限制.)但是请注意,ASCII范围以外的代码点将使用Unicode规则进行/i
匹配,因此修饰符并没有真正将事情限制为仅ASCII; 它只是禁止将ASCII和非ASCII混用.
To forbid ASCII/non-ASCII matches (like "k" with "\N{KELVIN SIGN}"), specify the "a" twice, for example
/aai
or/aia
. (The first occurrence of "a" restricts the\d
, etc., and the second occurrence adds the "/i" restrictions.) But, note that code points outside the ASCII range will use Unicode rules for/i
matching, so the modifier doesn't really restrict things to just ASCII; it just forbids the intermixing of ASCII and non-ASCII.
这篇关于“未实现可变长度向后看".但这不是可变长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!