在 python 中查找 utf-16 文件.如何? [英] utf-16 file seeking in python. how?
问题描述
由于某种原因,我无法找到我的 utf16 文件.它产生UnicodeException:UTF-16 流不以 BOM 开头".我的代码:
For some reason i can not seek my utf16 file. It produces 'UnicodeException: UTF-16 stream does not start with BOM'. My code:
f = codecs.open(ai_file, 'r', 'utf-16')
seek = self.ai_map[self._cbClass.Text] #seek is valid int
f.seek(seek)
while True:
ln = f.readline().strip()
我尝试了一些随机的东西,比如首先从流中读取一些东西,但没有帮助.我检查了使用十六进制编辑器寻求的偏移量 - 字符串从字符开始,而不是空字节(我猜它是好兆头,对吧?)那么如何在python中查找utf-16呢?
I tried random stuff like first reading something from stream, didnt help. I checked offset that is seeked to using hex editor - string starts at character, not null byte (i guess its good sign, right?) So how to seek utf-16 in python?
推荐答案
好吧,错误消息告诉您原因:它没有读取字节顺序标记.字节顺序标记位于文件的开头.在没有读取字节顺序标记的情况下,UTF-16 解码器无法知道字节的顺序.显然,它在你第一次阅读时懒惰地这样做,而不是在你打开文件时——否则它假设seek()
正在启动一个新的 UTF-16 流.
Well, the error message is telling you why: it's not reading a byte order mark. The byte order mark is at the beginning of the file. Without having read the byte order mark, the UTF-16 decoder can't know what order the bytes are in. Apparently it does this lazily, the first time you read, instead of when you open the file -- or else it is assuming that the seek()
is starting a new UTF-16 stream.
如果您的文件没有 BOM,那肯定是问题所在,您应该在打开文件时指定字节顺序(请参阅下面的 #2).否则,我会看到两个潜在的解决方案:
If your file doesn't have a BOM, that's definitely the problem and you should specify the byte order when opening the file (see #2 below). Otherwise, I see two potential solutions:
在查找之前读取文件的前两个字节以获取 BOM.你似乎说这不起作用,这表明它可能在寻找后期待一个新的 UTF-16 流,所以:
Read the first two bytes of the file to get the BOM before you seek. You seem to say this didn't work, indicating that perhaps it's expecting a fresh UTF-16 stream after the seek, so:
通过使用 utf-16-le
或 utf-16-be
作为打开文件时的编码来明确指定字节顺序.
Specify the byte order explicitly by using utf-16-le
or utf-16-be
as the encoding when you open the file.
这篇关于在 python 中查找 utf-16 文件.如何?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!