如何在Julia中加载UTF16编码的文本文件? [英] How do I load a UTF16-encoded text file in Julia?

查看：129 发布时间：2020/4/25 4:43:07 unicode encoding julia

本文介绍了如何在Julia中加载UTF16编码的文本文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个文本文件(很确定)是使用UTF16编码的，但是我不知道如何在Julia中加载它.我是否必须将其加载为字节，然后使用UTF16String进行转换?

I have a text file I am (pretty sure) is encoded in UTF16, but I don't know how to load it in Julia. Do I have to load it as bytes and then convert with UTF16String?

推荐答案

最简单的方法是将其读取为字节，然后进行转换:

The simplest way is to read it as bytes and then convert:

s = open(filename, "r") do f
    utf16(readbytes(f))
end

请注意，utf16还会检查字节顺序标记(BOM)，因此它将处理字节顺序问题，并且不会在生成的s中包含BOM.

Note that utf16 also checks for a byte-order-mark (BOM), so it will deal with endianness issues and won't include the BOM in the resulting s.

如果您真的想避免制作数据副本，并且知道它是本机端序的，也可以这样做，但是您必须显式编写一个NUL终止符(因为Julia UTF-16字符串数据内部具有一个最后是NUL代码点，用于传递给需要NUL终止数据的C例程)

If you really want to avoid making a copy of the data, and you know it is native-endian, this is possible too, but you have to explicitly write a NUL terminator (since Julia UTF-16 string data internally has a NUL codepoint at the end for passing to C routines that expect NUL-terminated data):

s = open(filename, "r") do f
    b = readbytes(f)
    resize!(b, length(b)+2)
    b[end] = b[end-1] = 0
    UTF16String(reinterpret(UInt16, b))
end

但是，典型的UTF-16文本文件将以BOM表开头，在这种情况下，字符串s会将BOM表作为其第一个字符，这可能不是您想要的.

However, typical UTF-16 text files will start with a BOM, and in this case the string s will include the BOM as its first character, which may not be what you want.

这篇关于如何在Julia中加载UTF16编码的文本文件?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Julia中加载UTF16编码的文本文件? [英] How do I load a UTF16-encoded text file in Julia?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在Julia中加载UTF16编码的文本文件? [英] How do I load a UTF16-encoded text file in Julia?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭