UTF-16的perl输入输出 [英] UTF-16 perl input output
问题描述
我编写的脚本采用UTF-16编码的文本文件作为输入,并输出一个UTF-16编码的文本文件。
use openencoding(UTF-16);
open INPUT,< input.txt
or diecan not open> input.txt:$!\\\
;
open(OUTPUT,> output.txt);
while(< INPUT>){
print OUTPUT$ _\\\
}
让我们只是说我的程序将所有内容从input.txt写入output.txt。
我的cygwin环境,这是使用这是perl 5,版本14,subversion 2(v5.14.2)为cygwin-thread-multi-64int构建
我的Windows环境,它使用这是perl 5,版本12,subversion 3(v5.12.3)为MSWin32-x64多线程,
例如:
<第一行文字>
㈀Ⰰ䌀栀栀愀⸀⸀⸀⸀⸀⸀甀愀渀渀䠀䠀ഊ<第二行文本>
...
任何人都可以了解为什么它在cygwin而不是windows
在Windows环境中:
$< b
$ b
unix
crlf
encoding(UTF-16)
utf8
unix
crlf
编码(UTF-16)
utf8
在Cygwin环境中:
unix
perlio
encoding(UTF-16)
utf8
unix
perlio
encoding(UTF-16)
utf8
是在perlio和crlf层之间。
[我要等待,给一个彻底的答案,但它可能更好if我给你一个快速的答案比没有。 ]
问题是 crlf
和 encoding
层的顺序错误。
例如,假设您列印「a\\\
使用UTF-16le(因为它更简单,它可能是你真正想要的)。你最终会得到
b\\\
c\\\
」;
61 00 0D 0A 00 62 00 0D 0A 00 63 00 0D 0A 00
而不是
61 00 0D 00 0A 00 62 00 0D 00 0A 00 63 00 0D 00 0A 00
t认为你可以使用
打开
pragma或binmode
获得正确的结果,但可以使用open
。open(my $ fh,'<:raw:encoding (UTF-16):crlf',$ qfn)
它适用于cygwin,因为
crlf code>图层仅在Windows上添加。
61 00 0A 00 62 00 0A 00 63 00 0A 00
I am writing a script that takes a UTF-16 encoded text file as input and outputs a UTF-16 encoded text file.
use open "encoding(UTF-16)"; open INPUT, "< input.txt" or die "cannot open > input.txt: $!\n"; open(OUTPUT,"> output.txt"); while(<INPUT>) { print OUTPUT "$_\n" }
Let's just say that my program writes everything from input.txt into output.txt.
This WORKS perfectly fine in my cygwin environment, which is using "This is perl 5, version 14, subversion 2 (v5.14.2) built for cygwin-thread-multi-64int"
But in my Windows environment, which is using "This is perl 5, version 12, subversion 3 (v5.12.3) built for MSWin32-x64-multi-thread",
Every line in output.txt is pre-pended with crazy symbols except the first line.
For example:
<FIRST LINE OF TEXT> ㈀ Ⰰ ㈀Ⰰ 嘀愀 ㌀ 䌀栀椀愀 䐀⸀⸀⸀ 儀甀愀渀最 䠀ഊ<SECOND LINE OF TEXT> ...
Can anyone give some insight on why it works on cygwin but not windows?
EDIT: After printing the encoded layers as suggested.
In Windows environment:
unix crlf encoding(UTF-16) utf8 unix crlf encoding(UTF-16) utf8
In Cygwin environment:
unix perlio encoding(UTF-16) utf8 unix perlio encoding(UTF-16) utf8
The only difference is between the perlio and crlf layer.
解决方案[ I was going to wait and give a thorough answer, but it's probably better if I give you a quick answer than nothing. ]
The problem is that
crlf
and theencoding
layers are in the wrong order. Not your fault.For example, say you do
print "a\nb\nc\n";
using UTF-16le (since it's simpler and it's probably what you actually want). You'd end up with61 00 0D 0A 00 62 00 0D 0A 00 63 00 0D 0A 00
instead of
61 00 0D 00 0A 00 62 00 0D 00 0A 00 63 00 0D 00 0A 00
I don't think you can get the right results with the
open
pragma or withbinmode
, but it can be done usingopen
.open(my $fh, '<:raw:encoding(UTF-16):crlf', $qfn)
You'll need to append a
:utf8
with some older version, IIRC.It works on cygwin because the
crlf
layer is only added on Windows. There you'd get61 00 0A 00 62 00 0A 00 63 00 0A 00
这篇关于UTF-16的perl输入输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!