List :: Util - reduce - length - encoding - 问题 [英] List::Util - reduce - length - encoding - question
问题描述
test.txt
__ BE
bb bbbbbbbbbbbbbbb
aaaaaa
test.pl
#!/ usr / bin / env perl
使用警告;使用5.012;
使用打开':encoding(UTF-8)';
使用List :: Util qw(reduce);
使用编码;
我的(@list,$ longest,$ len);
打开我的$ fh,'<','test.txt'或死$!
while(我的$ line = readline($ fh)){
chomp $ line;
push @list,split(/ \s + /,$ line);
}
关闭$ fh;
$ longest = reduce {length($ a)>长度($ b)? $ a:$ b} @list;
$ len = length $ longest;
说最长; #aaaaaa
说$ len; #6
$ longest = reduce {length(Encode :: encode_utf8($ a))> length(Encode :: encode_utf8($ b))? $ a:$ b} @list;
$ len = length(Encode :: encode_utf8($ longest));
说最长; #bbbbbbbbbbbbbbb
说$ len; #15
$ longest = $ list [0];
$ len = length $ longest;
为$ str(@list){
if(length($ str)> $ len){
$ longest = $ str;
$ len = length($ str);
}
}
说$最长; #bbbbbbbbbbbbbbb
说$ len; #15
AFAICS,甚至可能是一个错误Perl ...它的行为确实不是很明显。我修改了第一个减少打印诊断程序:
#!/ usr / bin / env perl
use警告;使用5.012;
使用打开':encoding(UTF-8)';
使用List :: Util qw(reduce);
使用编码;
我的(@list,$ longest,$ len);
打开我的$ fh,'<','test.txt'或死$!
while(我的$ line = readline($ fh)){
chomp $ line;
push @list,split(/ \s + /,$ line);
}
关闭$ fh;
$ longest = reduce {say< $ a>> /< $ b>>:,length($ a),:,length二);
length($ a)>长度($ b)? $ a:$ b} @list;
$ len = length $ longest;
说最长; #aaaaaa
说$ len; #6
$ longest = reduce {length(Encode :: encode_utf8($ a))> length(Encode :: encode_utf8($ b))? $ a:$ b} @list;
$ len = length(Encode :: encode_utf8($ longest));
说最长; #bbbbbbbbbbbbbbb
说$ len; #15
$ longest = $ list [0];
$ len = length $ longest;
为$ str(@list){
if(length($ str)> $ len){
$ longest = $ str;
$ len = length($ str);
}
}
说$最长; #bbbbbbbbbbbbbbb
说$ len; #15
当使用Perl 5.13.4在MacOS X(10.6.5)上运行时,输出I get是:
<>> /<&_ _ BE>> :0:4
< __ BE>> /<>> :0:0
<> /<< bb>> :0:2
< bb>> /<< bbbbbbbbbbbbbbb>> :0:15
<< bbbbbbbbbbbbbb>> /<>> :0:0
<> /<&aaaaaa>> :0:6
aaaaaa
6
bbbbbbbbbbbbbbb
15
bbbbbbbbbbbbbbb
15
对于所有外观,第一个reduce的第一个参数始终为零长度字符串,即使在包含某些数据的情况下也是这些奇数场合。
如果'使用open':encoding(UTF-8)';
'行被删除,那么它的行为一致。
<>> /< __ BE>> :0:4
< __ BE>> /<>> :4:0
< __ BE>> /<< bb>> :4:2
< __ BE> /<< bbbbbbbbbbbbbbb>> :4:15
<< bbbbbbbbbbbbbbb>> /<>> :15:0
<< bbbbbbbbbbbbbbb>> /<&aaaaaa>> :15:6
bbbbbbbbbbbbbb
15
bbbbbbbbbbbbbbb
15
bbbbbbbbbbbbbb
15
这可能表明该错误在文件I / O,UTF-8编码和List :: Util的交互中。另一方面,这可能会更加模糊不清。但我的印象是,您有一个可重复的测试用例,可能会被报告为Perl及其核心模块中的一个可能的错误。
Why do I get a wrong result with the first reduce example?
test.txt
__BE
bb bbbbbbbbbbbbbbb
aaaaaa
test.pl
#!/usr/bin/env perl
use warnings; use 5.012;
use open ':encoding(UTF-8)';
use List::Util qw(reduce);
use Encode;
my( @list, $longest, $len );
open my $fh, '<', 'test.txt' or die $!;
while( my $line = readline( $fh ) ) {
chomp $line;
push @list, split( /\s+/, $line );
}
close $fh;
$longest = reduce{ length($a) > length($b) ? $a : $b } @list;
$len = length $longest;
say $longest; # aaaaaa
say $len; # 6
$longest = reduce{ length(Encode::encode_utf8($a)) > length(Encode::encode_utf8($b)) ? $a : $b } @list;
$len = length(Encode::encode_utf8($longest));
say $longest; # bbbbbbbbbbbbbbb
say $len; # 15
$longest = $list[0];
$len = length $longest;
for my $str (@list) {
if ( length($str) > $len ) {
$longest = $str;
$len = length($str);
}
}
say $longest; # bbbbbbbbbbbbbbb
say $len; # 15
AFAICS, it might even be a bug in Perl...it certainly isn't obvious that it is behaving correctly. I modified the first reduce to print diagnostics as it goes:
#!/usr/bin/env perl
use warnings; use 5.012;
use open ':encoding(UTF-8)';
use List::Util qw(reduce);
use Encode;
my( @list, $longest, $len );
open my $fh, '<', 'test.txt' or die $!;
while( my $line = readline( $fh ) ) {
chomp $line;
push @list, split( /\s+/, $line );
}
close $fh;
$longest = reduce { say "<<$a>>/<<$b>> : ", length($a), " : ", length($b);
length($a) > length($b) ? $a : $b } @list;
$len = length $longest;
say $longest; # aaaaaa
say $len; # 6
$longest = reduce { length(Encode::encode_utf8($a)) > length(Encode::encode_utf8($b)) ? $a : $b } @list;
$len = length(Encode::encode_utf8($longest));
say $longest; # bbbbbbbbbbbbbbb
say $len; # 15
$longest = $list[0];
$len = length $longest;
for my $str (@list) {
if ( length($str) > $len ) {
$longest = $str;
$len = length($str);
}
}
say $longest; # bbbbbbbbbbbbbbb
say $len; # 15
When run on MacOS X (10.6.5) using Perl 5.13.4, the output I get is:
<<>>/<<__BE>> : 0 : 4
<<__BE>>/<<>> : 0 : 0
<<>>/<<bb>> : 0 : 2
<<bb>>/<<bbbbbbbbbbbbbbb>> : 0 : 15
<<bbbbbbbbbbbbbbb>>/<<>> : 0 : 0
<<>>/<<aaaaaa>> : 0 : 6
aaaaaa
6
bbbbbbbbbbbbbbb
15
bbbbbbbbbbbbbbb
15
To all appearances, the first argument to the first reduce is always a zero length string, even on those odd occasions when it contains some data.
If the 'use open ':encoding(UTF-8)';
' line is removed, then it behaves sanely.
<<>>/<<__BE>> : 0 : 4
<<__BE>>/<<>> : 4 : 0
<<__BE>>/<<bb>> : 4 : 2
<<__BE>>/<<bbbbbbbbbbbbbbb>> : 4 : 15
<<bbbbbbbbbbbbbbb>>/<<>> : 15 : 0
<<bbbbbbbbbbbbbbb>>/<<aaaaaa>> : 15 : 6
bbbbbbbbbbbbbbb
15
bbbbbbbbbbbbbbb
15
bbbbbbbbbbbbbbb
15
That might suggest that the bug is somewhere in the interaction of file I/O, UTF-8 encoding and List::Util. On the other hand, it could be somewhere more obscure. But my impression is that you have a test case that is reproducible and could be reported as a possible bug somewhere in Perl and its core modules.
这篇关于List :: Util - reduce - length - encoding - 问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!