可以文件::查找::规则修补以自动处理文件名字符编码/解码? [英] Could File::Find::Rule be patched to automatically handle filename character encoding/decoding?

查看:100
本文介绍了可以文件::查找::规则修补以自动处理文件名字符编码/解码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我在当前目录中有一个名为æ(UNICODE:0xE6,UTF8:0xC3 0xA6)的文件。



然后,我想使用 File :: Find :: Rule 找到它:

  use feature qw(say); 
使用open qw(:std:utf8);
使用strict;
使用utf8;
使用警告;

使用File :: Find :: Rule;

我的$ fn ='æ'
my @files = File :: Find :: Rule-> new-> name($ fn) - > in('。');
表示$ _ for @files;

输出为空,显然这没有办法。



如果我尝试首先编码文件名:

 使用编码; 

我的$ fn ='æ'
我的$ fn_utf8 = Encode :: encode('UTF-8',$ fn,Encode :: FB_CROAK | Encode :: LEAVE_SRC);
my @files = File :: Find :: Rule-> new-> name($ fn_utf8) - > in('。');
表示$ _ for @files;

输出是:

 Ã$ 

所以找到该文件,但返回的文件名不是解码成Perl字符串。要解决这个问题,我可以解码结果,替换最后一行:

 说Encode :: decode('UTF-8 ',$ _,Encode :: FB_CROAK)for @files; 

问题是如果编码和解码都可以/应该已经通过 File :: Find :: Rule 所以我可以使用我的原始程序,而不用担心编码和解码吗?



(例如,可以 File :: Find :: Rule 使用 I18N :: Langinfo 确定当前语言环境的代码集是 UTF-8

解决方案

是的,我希望。如果有一个主要的Perl项目,我会工作,这将是它。



问题是可能是严重编码的文件名,包括文件名编码使用不同于预期的编码。这意味着需要的第一件事是通过解码编码过程来循环执行严重编码的文件名。我认为Python使用代理对代码点来代表不好的字节。



您需要一个pragma才能确保向后兼容。


Suppose I have a file with name æ (UNICODE : 0xE6, UTF8 : 0xC3 0xA6) in the current directory.

Then, I would like to use File::Find::Rule to locate it:

use feature qw(say);
use open qw( :std :utf8 );
use strict;
use utf8;
use warnings;

use File::Find::Rule;

my $fn = 'æ';
my @files = File::Find::Rule->new->name($fn)->in('.');
say $_ for @files;

The output is empty, so apparently this did not work.

If I try to encode the filename first:

use Encode;

my $fn = 'æ';
my $fn_utf8 = Encode::encode('UTF-8', $fn, Encode::FB_CROAK | Encode::LEAVE_SRC);
my @files = File::Find::Rule->new->name($fn_utf8)->in('.');
say $_ for @files;

The output is:

æ

So it found the file, but the returned filename is not decoded into a Perl string. To fix this, I can decode the result, replacing the last line with:

say Encode::decode('UTF-8', $_, Encode::FB_CROAK) for @files;

The question is if both the encoding and decoding could/should have been done automatically by File::Find::Rule so I could have used my original program and not have had to worry about encoding and decoding at all?

(For example, could File::Find::Rule have used I18N::Langinfo to determine that the current locale's codeset is UTF-8 ?? )

解决方案

Yeah, I wish. If there's was a major Perl project I'd work on, this would be it.

The issue is that there could be badly-encoded file names, including file names encoded using a different encoding than expected. That means the first thing needed is a way of round-tripping badly-encoded file names through a decode-encode process. I think Python uses the surrogate pair code points to represent the bad bytes.

You would need a pragma to ensure backwards compatibility.

这篇关于可以文件::查找::规则修补以自动处理文件名字符编码/解码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆