在 python 中查找 utf-16 文件.如何? [英] utf-16 file seeking in python. how?

查看:41
本文介绍了在 python 中查找 utf-16 文件.如何?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于某种原因,我无法找到我的 utf16 文件.它产生UnicodeException:UTF-16 流不以 BOM 开头".我的代码:

For some reason i can not seek my utf16 file. It produces 'UnicodeException: UTF-16 stream does not start with BOM'. My code:

f = codecs.open(ai_file, 'r', 'utf-16')
seek = self.ai_map[self._cbClass.Text]  #seek is valid int
f.seek(seek)
while True:
    ln = f.readline().strip()

我尝试了一些随机的东西,比如首先从流中读取一些东西,但没有帮助.我检查了使用十六进制编辑器寻求的偏移量 - 字符串从字符开始,而不是空字节(我猜它是好兆头,对吧?)那么如何在python中查找utf-16呢?

I tried random stuff like first reading something from stream, didnt help. I checked offset that is seeked to using hex editor - string starts at character, not null byte (i guess its good sign, right?) So how to seek utf-16 in python?

推荐答案

好吧,错误消息告诉您原因:它没有读取字节顺序标记.字节顺序标记位于文件的开头.在没有读取字节顺序标记的情况下,UTF-16 解码器无法知道字节的顺序.显然,它在你第一次阅读时懒惰地这样做,而不是在你打开文件时——否则它假设seek() 正在启动一个新的 UTF-16 流.

Well, the error message is telling you why: it's not reading a byte order mark. The byte order mark is at the beginning of the file. Without having read the byte order mark, the UTF-16 decoder can't know what order the bytes are in. Apparently it does this lazily, the first time you read, instead of when you open the file -- or else it is assuming that the seek() is starting a new UTF-16 stream.

如果您的文件没有 BOM,那肯定是问题所在,您应该在打开文件时指定字节顺序(请参阅下面的 #2).否则,我会看到两个潜在的解决方案:

If your file doesn't have a BOM, that's definitely the problem and you should specify the byte order when opening the file (see #2 below). Otherwise, I see two potential solutions:

  1. 在查找之前读取文件的前两个字节以获取 BOM.你似乎说这不起作用,这表明它可能在寻找后期待一个新的 UTF-16 流,所以:

  1. Read the first two bytes of the file to get the BOM before you seek. You seem to say this didn't work, indicating that perhaps it's expecting a fresh UTF-16 stream after the seek, so:

通过使用 utf-16-leutf-16-be 作为打开文件时的编码来明确指定字节顺序.

Specify the byte order explicitly by using utf-16-le or utf-16-be as the encoding when you open the file.

这篇关于在 python 中查找 utf-16 文件.如何?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆