如何在Julia中加载UTF16编码的文本文件? [英] How do I load a UTF16-encoded text file in Julia?
问题描述
我有一个文本文件(很确定)是使用UTF16编码的,但是我不知道如何在Julia中加载它.我是否必须将其加载为字节,然后使用UTF16String
进行转换?
I have a text file I am (pretty sure) is encoded in UTF16, but I don't know how to load it in Julia. Do I have to load it as bytes and then convert with UTF16String
?
推荐答案
最简单的方法是将其读取为字节,然后进行转换:
The simplest way is to read it as bytes and then convert:
s = open(filename, "r") do f
utf16(readbytes(f))
end
请注意,utf16
还会检查字节顺序标记(BOM),因此它将处理字节顺序问题,并且不会在生成的s
中包含BOM.
Note that utf16
also checks for a byte-order-mark (BOM), so it will deal with endianness issues and won't include the BOM in the resulting s
.
如果您真的想避免制作数据副本,并且知道它是本机端序的,也可以这样做,但是您必须显式编写一个NUL终止符(因为Julia UTF-16字符串数据内部具有一个最后是NUL代码点,用于传递给需要NUL终止数据的C例程)
If you really want to avoid making a copy of the data, and you know it is native-endian, this is possible too, but you have to explicitly write a NUL terminator (since Julia UTF-16 string data internally has a NUL codepoint at the end for passing to C routines that expect NUL-terminated data):
s = open(filename, "r") do f
b = readbytes(f)
resize!(b, length(b)+2)
b[end] = b[end-1] = 0
UTF16String(reinterpret(UInt16, b))
end
但是,典型的UTF-16文本文件将以BOM表开头,在这种情况下,字符串s
会将BOM表作为其第一个字符,这可能不是您想要的.
However, typical UTF-16 text files will start with a BOM, and in this case the string s
will include the BOM as its first character, which may not be what you want.
这篇关于如何在Julia中加载UTF16编码的文本文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!