用Unicode字符创建文件名 [英] Creating filenames with unicode characters

查看:213
本文介绍了用Unicode字符创建文件名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找有关如何使用Unicode字符创建文件名的准则.考虑:

I am looking for some guidelines for how to create filenames with Unicode characters. Consider:

use open qw( :std :utf8 );
use strict;
use utf8;
use warnings;

use Data::Dump;
use Encode qw(encode);

my $utf8_file_name1 = encode('UTF-8', 'æ1', Encode::FB_CROAK | Encode::LEAVE_SRC);
my $utf8_file_name2 = 'æ2';
dd $utf8_file_name1;
dd $utf8_file_name2;
qx{touch $utf8_file_name1};
qx{touch $utf8_file_name2};
print (qx{ls æ*});

输出为:

"\xC3\xA61"
"\xE62"
æ1
æ2

为什么我将文件名编码为UTF8并不重要? (无论哪种方式,文件名仍然变为有效的UTF8.)

Why doesn't it matter if I encode the filename in UTF8 or not? (The filename still becomes valid UTF8 either way.)

推荐答案

由于存在一个名为"Unicode错误"的错误.正在发生以下情况:

Because of a bug called "The Unicode Bug". The equivalent of the following is happening:

use Encode qw( encode_utf8 is_utf8 );

my $bytes = is_utf8($str) ? encode_utf8($str) : $str;

is_utf8检查标量使用的是两种字符串存储格式.这是内部实现细节,您无需担心,除了Unicode Bug.

is_utf8 checks which of two string storage format is used by the scalar. This is an internal implementation detail you should never have to worry about, except for The Unicode Bug.

您的程序之所以起作用,是因为encode始终返回一个字符串,而is_utf8返回false,而use utf8;总是返回一个字符串,如果该字符串包含非ASCII字符,is_utf8返回true.

Your program works because encode always returns a string for which is_utf8 returns false, and use utf8; always returns a string for which is_utf8 returns true if the string contains non-ASCII characters.

如果您没有按照要求进行encode设置,则有时会得到错误的结果.例如,如果您使用"\x{E6}2"而不是'æ2',即使这些字符串具有相同的长度和相同的字符,您也将获得不同的文件名.

If you don't encode as you should, you will sometimes get the wrong result. For example, if you had used "\x{E6}2" instead of 'æ2', you would have gotten a different file name even though the strings have the same length and the same characters.

$ dir
total 0

$ perl -wE'
   use utf8;
   $fu="æ";
   $fd="\x{E6}";
   say sprintf "%vX", $_ for $fu, $fd;
   say $fu eq $fd ? "eq" : "ne";
   system("touch", $_) for "u".$fu, "d".$fd
'
E6
E6
eq

$ dir
total 0
-rw------- 1 ikegami ikegami 0 Jul 12 12:18 uæ
-rw------- 1 ikegami ikegami 0 Jul 12 12:18 d?

这篇关于用Unicode字符创建文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆