Perl日语到英文文件名替换 [英] Perl Japanese to English filename replacement

查看:116
本文介绍了Perl日语到英文文件名替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我把一个可以将日文文件名替换为英文文件名的perl脚本组合在一起。但是还有一些我不太了解的东西。



我有以下配置
客户端操作系统:



Windows XP日本



安装的记事本++



服务器:



红帽企业Linux服务器版本6.2



Perl v5 .10.1



VIM:VIM版本7.2.411



Xterm:ASTEC-X 6.0版



CSH:tcsh 6.17.00(Astron)



文件的来源是在Windows上生成的日语.csv文件。我看到关于在Perl中使用utf8和编码转换的帖子,我希望能够更好地理解为什么我不需要其他线程中提到的任何内容。



这是我的脚本吗?我的问题如下。

 #!/ usr / bin / perl 
my $ work_dir =/ nas1_home4 / fsomeguy /某个地方;
opendir(DIR,$ work_dir)或死无法打开目录;
我的@files = readdir(DIR);
foreach(@files)
{
我的$ original_file = $ _;
s /机/ -machine_ /; #替换机器-machine_
我的$ new_file = $ _;
if($ new_file ne $ original_file)
{
打印重命名。 $ original_file。 至 。 $ NEW_FILE;
重命名($ {work_dir} / $ {original_file},$ {work_dir} / $ {new_file}))或打印警告:重命名失败,因为:$!\\\
;
}
}

问题: p>

1)为什么此示例中不需要utf8?我需要什么类型的例子。
使用uft8;被讨论了:使用utf8给我'打印的宽字符')?但是如果我添加了utf8,那么这个脚本将无法正常工作。



2)为什么这个示例中不需要编码操作?

我实际上在Windows中使用记事本++(从Windows XP日本的资源管理器将日文字符粘贴到我的脚本)中写入脚本。在Xterm和VIM中,字符显示为垃圾字符。但是我没有必要处理编码操作,这在这里讨论过如何在Perl中将日文字符转换为unicode?



谢谢。



更新1



在Perl中测试一个简单的本地化示例,用于文件名和文件文本替换在日语中



在Windows XP中,从.csv数据文件中复制南字符并复制到剪贴板,然后将其用作文件名(即南.txt)和文件内容(南)。在Notepad ++中,读取UTF-8编码的文件显示x93xEC,在SHIFT_JIS显示下读取。



脚本:



使用以下Perl脚本south.pl,它将在具有Perl 5.10的Linux服务器上运行

 #!/ usr / bin / perl 
使用功能qw(说);

使用strict;
使用警告;
使用utf8;
使用编码qw(解码编码);

我的$ user_dir =/ usr / frank;
my $ work_dir =$ {user_dir} / test_south;

#forward声明函数原型
sub fileProcess;

opendir(DIR,$ {work_dir})或死无法打开目录。 $ {} WORK_DIR;

#readdir选项1 - shift_jis
#my @files = map {Encode :: decode(shift_jis,$ _); } readdir DIR; #注意文件名无法解码为shift_jis
#binmode(STDOUT,:encoding(shift_jis));

#readdir选项2 - utf8
我的@files = map {Encode :: decode(utf8,$ _); } readdir DIR; #注意文件名可以解码为utf8
binmode(STDOUT,:encoding(utf8)); #设置显示输出utf8

说@files;

#传递将被修改的文件的数组引用
fileNameTranslate();
fileProcess();

closedir(DIR);

退出;

sub fileNameTranslate
{

foreach(@files)
{
my $ original_file = $ _;
#printoriginal_file:。 $ original_file。 \\\
;
s /南/南/;

我的$ new_file = $ _;
#printnew_file:。 $ _。 \\\
;

如果($ new_file ne $ original_file)
{
打印重命名。 $ original_file。 到\\。 $ new_file。 \\\
;
重命名($ {work_dir} / $ {original_file},$ {work_dir} / $ {new_file}))或打印警告:重命名失败,因为:$!\\\
;
}
}
}

子文件处理
{

#文件处理选项3,打开文件作为shift_jis,搜索和替换将工作
#open(IN1,<:encoding(shift_jis),$ {work_dir} /south.txt)或死错误:south.txt\\\
;
#open(OUT1,+>:encoding(shift_jis),$ {work_dir} /south1.txt)或死错误:south1.txt\\\
;

#文件处理选项4,打开文件为utf8,搜索和替换不起作用
打开(IN1,<:encoding(utf8),$ {work_dir} / south.txt)或死错误:south.txt\\\
;
打开(OUT1,+>:encoding(utf8),$ {work_dir} /south1.txt)或死错误:south1.txt\\\
;

while(< IN1>)
{
print $ _。 \\\
;
chomp;

s /南/南/ g;


打印OUT1$ _\\\
;
}

关闭IN1;
关闭OUT1;
}

结果:



(BAD)取消注释选项1和3,(注释选项2和4)
设置:Readdir编码,SHIFT_JIS;文件打开编码SHIFT_JIS
结果:文件名替换失败..
错误:utf8\x93不映射到.//south.pl行68的Unicode。
\x93



(BAD)取消注释选项2和4(注释选项1和3)
设置:Readdir encoding,utf8;文件打开编码utf8
结果:文件名替换工作,south.txt生成
但是south1.txt文件内容替换失败,它具有内容\x93()。
错误:\x {fffd}不映射到////south.pl第25行的shiftjis。
... -Ao?=(Bx {fffd} .txt


(GOOD)取消注释选项2和3(注释选项1和4)
设置:Readdir encoding,utf8;文件打开编码SHIFT_JIS
结果:文件名称替换工作,south.txt生成
South1.txt文件内容替换工作,它的内容为南。



结论:



我不得不使用不同的编码方案,这个例子可以正常工作。Readdir utf8和文件处理SHIFT_JIS,因为csv文件的内容是SHIFT_JIS编码的。 >

解决方案

您的脚本完全unicode不知道,它将所有字符串视为字节序列,幸运的是,编码文件名的字节与编码源代码中使用的日文字符的字节,如果您将Perl告诉使用utf8 ,则会解释脚本中的日文字符,但是不是来自文件系统的,所以不会匹配。


I put together a perl script that works to replace Japanese file names to English file names. But there are still a couple of things that I don’t quite understand well.

I have the following configuration Client OS:

Windows XP Japan

Notepad++, installed

Server:

Red Hat Enterprise Linux Server release 6.2

Perl v5.10.1

VIM : VIM version 7.2.411

Xterm : ASTEC-X version 6.0

CSH: tcsh 6.17.00 (Astron)

The source of the files are Japanese .csv files generated on Windows. I saw posts about using utf8 and encoding conversion in Perl, and I hope to understand better why I didn’t need anything mentioned in the other threads.

Here is my script that worked? My questions are below.

#!/usr/bin/perl
my $work_dir = "/nas1_home4/fsomeguy/someplace";
opendir(DIR, $work_dir) or die "Cannot open directory";
my @files = readdir(DIR);
foreach (@files) 
{
    my $original_file = $_; 
    s/機/–machine_/; # replace 機 with -machine_
    my $new_file = $_;
    if ($new_file ne $original_file)
    {
        print "Rename " . $original_file . " to " . $new_file;
        rename("${work_dir}/${original_file}", "${work_dir}/${new_file}") or  print "Warning: rename failed because: $!\n";
    }
}

Questions:

1) Why isn’t utf8 required in this sample? In what type of examples would I need it. Use uft8; was discussed: use utf8 gives me 'Wide character in print')? But if I have added use utf8, then this script won’t work.

2) Why isn’t encoding manipulation required in this sample?
I actually wrote the script in Windows using Notepad++ (pasting in the Japanese characters from Windows XP Japan’s Explorer to my script). In Xterm, and VIM, the characters show up as garbage characters. But I didn’t have to deal with Encoding manipulation either, which was discussed here How can I convert japanese characters to unicode in Perl? .

Thanks.

UPDATES 1

Testing a simple localization sample in Perl for filename and file text replacement in Japanese

In Windows XP, copy the 南 character from within a .csv data file and copy to the clipboard, then use it as both the file name (ie. 南.txt) and file content (南). In Notepad++ , reading the file under encoding UTF-8 shows x93xEC, reading it under SHIFT_JIS displays南.

Script:

Use the following Perl script south.pl, which will be run on a Linux server with Perl 5.10

#!/usr/bin/perl
use feature qw(say);

use strict;
use warnings;
use utf8;
use Encode qw(decode encode);

my $user_dir="/usr/frank";
my $work_dir = "${user_dir}/test_south";

# forward declare the function prototypes
sub fileProcess;

opendir(DIR, ${work_dir}) or die "Cannot open directory " . ${work_dir};

# readdir OPTION 1 - shift_jis
#my @files = map { Encode::decode("shift_jis", $_); } readdir DIR; # Note filename    could not be decoded as shift_jis
#binmode(STDOUT,":encoding(shift_jis)");                    

# readdir OPTION 2 - utf8
my @files = map { Encode::decode("utf8", $_); } readdir DIR; # Note filename could be decoded as utf8
binmode(STDOUT,":encoding(utf8)");                           # setting display to output utf8

say @files;                                 

# pass an array reference of files that will be modified
fileNameTranslate();
fileProcess();

closedir(DIR);

exit;

sub fileNameTranslate
{

    foreach (@files) 
    {
        my $original_file = $_; 
        #print "original_file: " . "$original_file" . "\n";     
        s/南/south/;     

        my $new_file = $_;
        # print "new_file: " . "$_" . "\n";

        if ($new_file ne $original_file)
        {
            print "Rename " . $original_file . " to \n\t" . $new_file . "\n";
            rename("${work_dir}/${original_file}", "${work_dir}/${new_file}") or print "Warning: rename failed because: $!\n";
        }
    }
}

sub fileProcess
{

    #   file process OPTION 3, open file as shift_jis, the search and replace would work
    #   open (IN1,  "<:encoding(shift_jis)", "${work_dir}/south.txt") or die "Error: south.txt\n";
    #   open (OUT1, "+>:encoding(shift_jis)" , "${work_dir}/south1.txt") or die "Error: south1.txt\n";  

    #   file process OPTION 4, open file as utf8, the search and replace would not work
open (IN1,  "<:encoding(utf8)", "${work_dir}/south.txt") or die "Error: south.txt\n";
    open (OUT1, "+>:encoding(utf8)" , "${work_dir}/south1.txt") or die "Error: south1.txt\n";   

    while (<IN1>)
    {
        print $_ . "\n";
        chomp;

        s/南/south/g;


        print OUT1 "$_\n";
    }

    close IN1;
    close OUT1; 
}

Result:

(BAD) Uncomment Option 1 and 3, (Comment Option 2 and 4) Setup: Readdir encoding, SHIFT_JIS; file open encoding SHIFT_JIS Result: file name replacement failed.. Error: utf8 "\x93" does not map to Unicode at .//south.pl line 68. \x93

(BAD) Uncomment Option 2 and 4 (Comment Option 1 and 3) Setup: Readdir encoding, utf8; file open encoding utf8 Result: file name replacement worked, south.txt generated But south1.txt file content replacement failed , it has the content \x93 (). Error: "\x{fffd}" does not map to shiftjis at .//south.pl line 25. ... -Ao?= (Bx{fffd}.txt

(GOOD) Uncomment Option 2 and 3, (Comment Option 1 and 4) Setup: Readdir encoding, utf8; file open encoding SHIFT_JIS Result: file name replacement worked, south.txt generated South1.txt file content replacement worked, it has the content south.

Conclusion:

I had to use different encoding scheme for this example to work properly. Readdir utf8, and file processing SHIFT_JIS since the content of the csv file was SHIFT_JIS encoded.

解决方案

Your script is totally unicode unaware. It treats all the strings as sequences of bytes. Fortunately, the bytes encoding the file names are identical to bytes encoding the Japanese characters used in the source. If you tell Perl to use utf8, it would interpret the Japanese characters in your script, but not the ones coming from the file system, so there will be no match.

这篇关于Perl日语到英文文件名替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆