为什么在我的正则表达式模式中使用POSIX字符类会产生意外的结果? [英] Why is using a POSIX character class in my regex pattern giving unexpected results?

查看:73
本文介绍了为什么在我的正则表达式模式中使用POSIX字符类会产生意外的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一些奇怪的Perl行为:在正则表达式中使用Posix字符类会完全改变结果字符串的排序顺序.

I have encountered some strange Perl behavior: using a Posix character class in a regexp completely alters the sort order for the resulting strings.

这是我的测试程序:

sub namecmp($a,$b) {
  $a=~/([:alpha:]*)/;
  # $a=~/([a-z]*)/;
  $aword= $1;

  $b=~/([:alpha:]*)/;
  # $b=~/([a-z]*)/;
  $bword= $1;
  return $aword cmp $bword;
};

$_= <>;
@names= sort namecmp split;
print join(" ", @names), "\n";

如果使用[a-z]更改为已注释掉的正则表达式,则将获得正常的字典编排顺序.但是,Posix [:alpha:]字符类产生一些奇怪的排序顺序,如下所示:

If you change to the commented-out regexp's using [a-z], you get the normal, lexicographic sort order. However, the Posix [:alpha:] character class yields some weird-ass sort order, as follows:

$test_normal
aaa aab aac aba abb abc aca acb acc baa bab bac bba bbb bbc bca bcb bcc caa cbb
aaa aab aac aba abb abc aca acb acc baa bab bac bba bbb bbc bca bcb bcc caa cbb

$test_posix
aaa aab aac aba abb abc aca acb acc baa bab bac bba bbb bbc bca bcb bcc caa cbb
baa bab bac bba bbb bbc bca bcb bcc caa cbb aba abb abc aca acb acc aab aac aaa

我最好的猜测是Posix字符类正在激活某种我从未听说过且不需要的语言环境.我想对医生,医生,当我这样做 时会很痛!"的逻辑反应是!是,那么,那就不要做那个!".

My best guess is that the Posix character class is activating some kind of locale stuff I've never heard of and didn't ask for. I suppose the logical reaction to "doctor, doctor, it hurts when I do this!" is, "well, don't do that, then!".

但是,谁能告诉我这里发生了什么,为什么?我正在使用perl 5.10,但我相信它也可以在perl 5.8下使用.

But, can anyone tell me what's happening here, and why? I'm using perl 5.10, but I believe it also works under perl 5.8.

推荐答案

字符类[:alpha:]表示Perl正则表达式中的字母字符,但是方括号不是表示它们通常在常用表达.因此,您需要:

The character class [:alpha:] represents alpha characters in Perl regular expressions, but the square brackets do not mean what they normally do in regular expressions. So you need:

$a=~/([[:alpha:]]*)/;

perlre 中提到:

POSIX字符类语法

The POSIX character class syntax

[:class:]

也可用.注意,[]括号是文字;必须始终在字符类表达式中使用它们.

is also available. Note that the [ and ] brackets are literal; they must always be used within a character class expression.

# this is correct:
$string =~ /[[:alpha:]]/;

# this is not, and will generate a warning:
$string =~ /[:alpha:]/;

这篇关于为什么在我的正则表达式模式中使用POSIX字符类会产生意外的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆