可以文件::查找::规则修补以自动处理文件名字符编码/解码? [英] Could File::Find::Rule be patched to automatically handle filename character encoding/decoding?
问题描述
æ
(UNICODE:0xE6,UTF8:0xC3 0xA6)的文件。 然后,我想使用 File :: Find :: Rule
找到它:
use feature qw(say);
使用open qw(:std:utf8);
使用strict;
使用utf8;
使用警告;
使用File :: Find :: Rule;
我的$ fn ='æ'
my @files = File :: Find :: Rule-> new-> name($ fn) - > in('。');
表示$ _ for @files;
输出为空,显然这没有办法。
如果我尝试首先编码文件名:
使用编码;
我的$ fn ='æ'
我的$ fn_utf8 = Encode :: encode('UTF-8',$ fn,Encode :: FB_CROAK | Encode :: LEAVE_SRC);
my @files = File :: Find :: Rule-> new-> name($ fn_utf8) - > in('。');
表示$ _ for @files;
输出是:
Ã$
所以找到该文件,但返回的文件名不是解码成Perl字符串。要解决这个问题,我可以解码结果,替换最后一行:
说Encode :: decode('UTF-8 ',$ _,Encode :: FB_CROAK)for @files;
问题是如果编码和解码都可以/应该已经通过 File :: Find :: Rule
所以我可以使用我的原始程序,而不用担心编码和解码吗?
(例如,可以 File :: Find :: Rule
使用 I18N :: Langinfo
确定当前语言环境的代码集是 UTF-8
是的,我希望。如果有一个主要的Perl项目,我会工作,这将是它。
问题是可能是严重编码的文件名,包括文件名编码使用不同于预期的编码。这意味着需要的第一件事是通过解码编码过程来循环执行严重编码的文件名。我认为Python使用代理对代码点来代表不好的字节。
您需要一个pragma才能确保向后兼容。
Suppose I have a file with name æ
(UNICODE : 0xE6, UTF8 : 0xC3 0xA6) in the current directory.
Then, I would like to use File::Find::Rule
to locate it:
use feature qw(say);
use open qw( :std :utf8 );
use strict;
use utf8;
use warnings;
use File::Find::Rule;
my $fn = 'æ';
my @files = File::Find::Rule->new->name($fn)->in('.');
say $_ for @files;
The output is empty, so apparently this did not work.
If I try to encode the filename first:
use Encode;
my $fn = 'æ';
my $fn_utf8 = Encode::encode('UTF-8', $fn, Encode::FB_CROAK | Encode::LEAVE_SRC);
my @files = File::Find::Rule->new->name($fn_utf8)->in('.');
say $_ for @files;
The output is:
æ
So it found the file, but the returned filename is not decoded into a Perl string. To fix this, I can decode the result, replacing the last line with:
say Encode::decode('UTF-8', $_, Encode::FB_CROAK) for @files;
The question is if both the encoding and decoding could/should have been done automatically by File::Find::Rule
so I could have used my original program and not have had to worry about encoding and decoding at all?
(For example, could File::Find::Rule
have used I18N::Langinfo
to determine that the current locale's codeset is UTF-8
?? )
Yeah, I wish. If there's was a major Perl project I'd work on, this would be it.
The issue is that there could be badly-encoded file names, including file names encoded using a different encoding than expected. That means the first thing needed is a way of round-tripping badly-encoded file names through a decode-encode process. I think Python uses the surrogate pair code points to represent the bad bytes.
You would need a pragma to ensure backwards compatibility.
这篇关于可以文件::查找::规则修补以自动处理文件名字符编码/解码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!