如何在Perl中将输入文件转换为UTF-8编码? [英] How can I convert an input file to UTF-8 encoding in Perl?

查看:241
本文介绍了如何在Perl中将输入文件转换为UTF-8编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经知道如何使用类似下面的代码将一行文件的非utf8编码的内容逐行转换为UTF-8编码:

 #outfile.txt是在GB-2312编码
打开我的$过滤器,<,'c:/outfile.txt';

while(< $ filter>){
#将每行outfile.txt转换为UTF-8编码
$ _ = Encode :: decode(gb2312, $ _);
...}

但我认为Perl可以直接将整个输入文件编码为UTF -8格式,所以我尝试了像

 #outfile.txt是在GB-2312编码
open my $ filter,<:utf8,'c:/outfile.txt';

(Perl说utf8\ xD4不映射到Unicode)

 打开我的$ filter,< :/ outfile.txt'; 
$ filter = Encode :: decode(gb2312,$ filter);

(Perl在未打开的文件句柄中说readline())



它们不工作,但是有没有办法直接将输入文件转换为UTF-8编码?



更新: / strong>



看起来事情并不像我想象的那么简单,我现在可以以一种迂回的方式将输入文件转换为UTF-8代码。输入文件,然后将其内容编码为UTF-8,然后输出到一个新文件,然后打开新文件进行进一步处理,这是代码:

 打开我的$ filter,'<:encoding(gb2312)','c:/outfile.txt'; 
打开我的$ filter_new,'+>:utf8' 'c:/outfile_new.txt';
print $ filter_new $ _ while< $ filter> ;;
while(< $ filter_new>){
...
}

但这是太多的工作,它比直接编码$ filter行的内容更麻烦

解决方案

我想我误解了你的问题。我想你想要做的是读取一个非UTF-8编码的文件,然后在程序中以UTF-8的格式播放数据。这更容易一些。在使用正确的编码读取数据后,Perl在内部将其表示为UTF-8。



当你把它写回来时,可以使用你想要的任何编码来保存它。






old answer



Perl I / O层只有在已经正确编码的情况下才读取数据。它不会为你转换编码。通过告诉open使用utf8,你告诉它它已经是utf8。



你必须使用编码模块,如你所示(除非你想编写自己的I / O层)。您可以将字节转换为UTF-8,或者如果您知道编码,您可以从一个编码转换为另一个。因为它看起来像你已经知道编码,你可能想要 from_to()函数。



刚开始使用Perl和Unicode,请先执行 Juerd的Perl Unicode建议


I already know how to convert the non-utf8-encoded content of a file line by line to UTF-8 encode, using something like the following code:

# outfile.txt is in GB-2312 encode    
open my $filter,"<",'c:/outfile.txt'; 

while(<$filter>){
#convert each line of outfile.txt to UTF-8 encoding   
    $_ = Encode::decode("gb2312", $_); 
...}

But I think Perl can directly encode the whole input file to UTF-8 format, so I've tried something like

#outfile.txt is in GB-2312 encode
open my $filter,"<:utf8",'c:/outfile.txt'; 

(Perl says something like "utf8 "\xD4" does not map to Unicode" )

and

open my $filter,"<",'c:/outfile.txt'; 
$filter = Encode::decode("gb2312", $filter); 

(Perl says "readline() on unopened filehandle!)

They don't work. But is there some way to directly convert the input file to UTF-8 encode?

Update:

Looks like things are not as simple as I thought. I now can convert the input file to UTF-8 code in a roundabout way. I first open the input file and then encode the content of it to UTF-8 and then output to a new file and then open the new file for further processing. This is the code:

open my $filter,'<:encoding(gb2312)','c:/outfile.txt'; 
open my $filter_new, '+>:utf8', 'c:/outfile_new.txt'; 
print $filter_new $_ while <$filter>; 
while (<$filter_new>){
...
} 

But this is too much work and it is even more troublesome than simply encode the content of $filter line by line.

解决方案

I think I misunderstood your question. I think what you want to do is read a file in a non-UTF-8 encoding, then play with the data as UTF-8 in your program. That's something much easier. After you read the data with the right encoding, Perl represents it internally as UTF-8. So, just do what you have to do.

When you write it back out, use whatever encoding you want to save it as. However, you don't have to put it back in a file to use it.


old answer

The Perl I/O layers only read the data assuming it's already properly encoded. It's not going to convert encoding for you. By telling open to use utf8, you're telling it that it already is utf8.

You have to use the Encode module just as you've shown (unless you want to write your own I/O layer). You can convert bytes to UTF-8, or if you know the encoding, you can convert from one encoding to another. Since it looks like you already know the encoding, you might want the from_to() function.

If you're just starting out with Perl and Unicode, go through Juerd's Perl Unicode Advice before you do anything.

这篇关于如何在Perl中将输入文件转换为UTF-8编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆