当我不知道字节顺序时，如何在Perl中解码UTF-16数据？ [英] How can I decode UTF-16 data in Perl when I don't know the byte order?

查看：82 发布时间：2020/10/19 19:42:02 perl decode utf-16

本文介绍了当我不知道字节顺序时，如何在Perl中解码UTF-16数据？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果我打开一个文件（并直接指定编码）：

If I open a file ( and specify an encoding directly ) :

open(my $file,"<:encoding(UTF-16)","some.file") || die "error $!\n";
while(<$file>) {
    print "$_\n";
}
close($file);

我可以很好地读取文件内容。但是，如果我这样做：

I can read the file contents nicely. However, if I do:

use Encode;

open(my $file,"some.file") || die "error $!\n";
while(<$file>) {
    print decode("UTF-16",$_);
}
close($file);

我收到以下错误：

UTF-16:Unrecognised BOM d at F:/Perl/lib/Encode.pm line 174

$ b $无法识别的BOM d b

如何使其与解码一起使用？

How can I make it work with decode?

编辑：这是前几个字节：

here are the first several bytes:

FF FE 3C 00 68 00 74 00

推荐答案

如果仅指定 UTF-16，Perl将查找字节序标记（BOM），以弄清楚如何解析它。如果没有材料明细表，它将被炸掉。在这种情况下，您必须通过将 UTF-16LE（小端）或 UTF-16BE（大端）指定来告诉Encode您具有哪个字节顺序。

If you simply specify "UTF-16", Perl is going to look for the byte-order mark (BOM) to figure out how to parse it. If there is no BOM, it's going to blow up. In that case, you have to tell Encode which byte-order you have by specifying either "UTF-16LE" for little-endian or "UTF-16BE" for big-endian.

虽然您的情况还有其他问题，但是如果不查看文件中的数据就很难分辨。两个摘要出现相同的错误。如果我没有BOM，也没有指定字节顺序，那么我的Perl会抱怨。您正在使用哪个Perl，以及拥有哪个平台？您的平台是否具有文件的本地字节序？我认为根据文档，我看到的行为是正确的。

There's something else going on with your situation though, but it's hard to tell without seeing the data you have in the file. I get the same error with both snippets. If I don't have a BOM and I don't specify a byte order, my Perl complains either way. Which Perl are you using and which platform do you have? Does your platform have the native endianness of your file? I think the behaviour I see is correct according to the docs.

此外，您不能简单地读取某种未知编码的行（无论Perl的默认设置是什么）然后发货然后转到解码。您可能最终会遇到一个多字节序列。您必须使用 Encode :: FB_QUIET 保存无法解码的缓冲区部分，并将其添加到下一个数据块中：

Also, you can't simply read a line in some unknown encoding (whatever Perl's default is) then ship that off to decode. You might end up in the middle of a multi-byte sequence. You have to use Encode::FB_QUIET to save the part of the buffer that you couldn't decode and add that to the next chunk of data:

open my($lefh), '<:raw', 'text-utf16.txt';

my $string;
while( $string .= <$lefh> ) {
    print decode("UTF-16LE", $string, Encode::FB_QUIET) 
    }

这篇关于当我不知道字节顺序时，如何在Perl中解码UTF-16数据？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

当我不知道字节顺序时，如何在Perl中解码UTF-16数据？ [英] How can I decode UTF-16 data in Perl when I don't know the byte order?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

当我不知道字节顺序时，如何在Perl中解码UTF-16数据？ [英] How can I decode UTF-16 data in Perl when I don&#39;t know the byte order?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

当我不知道字节顺序时，如何在Perl中解码UTF-16数据？ [英] How can I decode UTF-16 data in Perl when I don't know the byte order?

登录关闭