UTF-16的perl输入输出 [英] UTF-16 perl input output

查看:507
本文介绍了UTF-16的perl输入输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写的脚本采用UTF-16编码的文本文件作为输入,并输出一个UTF-16编码的文本文件。

  use openencoding(UTF-16); 

open INPUT,< input.txt
or diecan not open> input.txt:$!\\\
;
open(OUTPUT,> output.txt);

while(< INPUT>){
print OUTPUT$ _\\\

}

让我们只是说我的程序将所有内容从input.txt写入output.txt。



我的cygwin环境,这是使用这是perl 5,版本14,subversion 2(v5.14.2)为cygwin-thread-multi-64int构建



我的Windows环境,它使用这是perl 5,版本12,subversion 3(v5.12.3)为MSWin32-x64多线程,





例如:

 <第一行文字> 
਀㈀㄀Ⰰ䌀栀栀愀⸀⸀⸀⸀⸀⸀甀愀渀渀䠀䠀ഊ<第二行文本>
...

任何人都可以了解为什么它在cygwin而不是windows



在Windows环境中:

$



< b
$ b

  unix 
crlf
encoding(UTF-16)
utf8
unix
crlf
编码(UTF-16)
utf8



在Cygwin环境中:

  unix 
perlio
encoding(UTF-16)
utf8
unix
perlio
encoding(UTF-16)
utf8

是在perlio和crlf层之间。

解决方案

[我要等待,给一个彻底的答案,但它可能更好if我给你一个快速的答案比没有。 ]



问题是 crlf encoding 层的顺序错误。



例如,假设您列印「a\\\
b\\\
c\\\
」;
使用UTF-16le(因为它更简单,它可能是你真正想要的)。你最终会得到

  61 00 0D 0A 00 62 00 0D 0A 00 63 00 0D 0A 00 



而不是

  61 00 0D 00 0A 00 62 00 0D 00 0A 00 63 00 0D 00 0A 00 

t认为你可以使用打开 pragma或 binmode 获得正确的结果,但可以使用 open

  open(my $ fh,'<:raw:encoding (UTF-16):crlf',$ qfn)



它适用于cygwin,因为 crlf code>图层仅在Windows上添加。

  61 00 0A 00 62 00 0A 00 63 00 0A 00 


I am writing a script that takes a UTF-16 encoded text file as input and outputs a UTF-16 encoded text file.

use open "encoding(UTF-16)";

open INPUT, "< input.txt"
   or die "cannot open > input.txt: $!\n";
open(OUTPUT,"> output.txt");

while(<INPUT>) {
   print OUTPUT "$_\n"
}

Let's just say that my program writes everything from input.txt into output.txt.

This WORKS perfectly fine in my cygwin environment, which is using "This is perl 5, version 14, subversion 2 (v5.14.2) built for cygwin-thread-multi-64int"

But in my Windows environment, which is using "This is perl 5, version 12, subversion 3 (v5.12.3) built for MSWin32-x64-multi-thread",

Every line in output.txt is pre-pended with crazy symbols except the first line.

For example:

<FIRST LINE OF TEXT>
਀    ㈀  ㄀Ⰰ ㈀Ⰰ 嘀愀 ㌀ 䌀栀椀愀 䐀⸀⸀⸀  儀甀愀渀最 䠀ഊ<SECOND LINE OF TEXT>
...

Can anyone give some insight on why it works on cygwin but not windows?

EDIT: After printing the encoded layers as suggested.

In Windows environment:

unix
crlf
encoding(UTF-16)
utf8
unix
crlf
encoding(UTF-16)
utf8

In Cygwin environment:

unix
perlio
encoding(UTF-16)
utf8
unix
perlio
encoding(UTF-16)
utf8

The only difference is between the perlio and crlf layer.

解决方案

[ I was going to wait and give a thorough answer, but it's probably better if I give you a quick answer than nothing. ]

The problem is that crlf and the encoding layers are in the wrong order. Not your fault.

For example, say you do print "a\nb\nc\n"; using UTF-16le (since it's simpler and it's probably what you actually want). You'd end up with

61 00 0D 0A 00 62 00 0D 0A 00 63 00 0D 0A 00

instead of

61 00 0D 00 0A 00 62 00 0D 00 0A 00 63 00 0D 00 0A 00

I don't think you can get the right results with the open pragma or with binmode, but it can be done using open.

open(my $fh, '<:raw:encoding(UTF-16):crlf', $qfn)

You'll need to append a :utf8 with some older version, IIRC.

It works on cygwin because the crlf layer is only added on Windows. There you'd get

61 00 0A 00 62 00 0A 00 63 00 0A 00

这篇关于UTF-16的perl输入输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆