UnicodeDecodeError:'gbk'编解码器无法解码位置0非法多字节序列中的字节0x80 [英] UnicodeDecodeError:'gbk' codec can't decode byte 0x80 in position 0 illegal multibyte sequence
问题描述
我在Win 7 64位系统上使用python 3.4.我运行了以下代码:
I use python 3.4 with win 7 64-bit system. I ran the following code:
6 """ load single batch of cifar """
7 with open(filename, 'r') as f:
----> 8 datadict = pickle.load(f)
9 X = datadict['data']
错误信息是UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 0: illegal multibyte sequence
我将第7行更改为:
6 """ load single batch of cifar """
7 with open(filename, 'r',encoding='utf-8') as f:
----> 8 datadict = pickle.load(f)
9 X = datadict['data']
错误信息变成UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
.
消息最终指向解码(自己,输入,最终)中的Python34 \ lib \ codecs.py.
The message finally points to the Python34\lib\codecs.py in decode(self, input, final).
311 # decode input (taking the buffer into account)
312 data = self.buffer + input
--> 313 (result, consumed) = self._buffer_decode(data, self.errors, final)
314 # keep undecoded input until the next call
315 self.buffer = data[consumed:]
我进一步将代码更改为:
I further changed the code as:
6 """ load single batch of cifar """
7 with open(filename, 'rb') as f:
----> 8 datadict = pickle.load(f)
9 X = datadict['data'] 10 Y = datadict['labels']
好吧,这次是UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)
.
问题是什么以及如何解决?
What is the problem and how to solve it?
推荐答案
修补程序文件是二进制数据文件,因此在加载时始终必须使用'rb'
模式打开文件.不要在这里尝试使用文本模式.
Pickle files are binary data files, so you always have to open the file with the 'rb'
mode when loading. Don't try to use a text mode here.
您正试图加载包含字符串数据的Python 2泡菜.您必须告诉pickle.load()
如何将数据转换为Python 3字符串,或将其保留为字节.
You are trying to load a Python 2 pickle that contains string data. You'll have to tell pickle.load()
how to convert that data to Python 3 strings, or to leave them as bytes.
默认设置是尝试将这些字符串解码为ASCII,并且解码失败.请参见 pickle.load()
文档:
The default is to try and decode those strings as ASCII, and that decoding fails. See the pickle.load()
documentation:
可选的关键字参数是 fix_imports , encoding 和 errors ,它们用于控制对Python 2生成的pickle流的兼容性支持. fix_imports 是正确的,pickle会尝试将旧的Python 2名称映射到Python 3中使用的新名称. encoding 和 errors 告诉pickle如何解码Python 2腌制的8位字符串实例;它们分别默认为"ASCII"和"strict". encoding 可以是字节",以将这些8位字符串实例读取为字节对象.
Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.
将编码设置为latin1
允许您直接导入数据:
Setting the encoding to latin1
allows you to import the data directly:
with open(filename, 'rb') as f:
datadict = pickle.load(f, encoding='latin1')
似乎是numpy
数组数据在这里引起了问题,因为该集中的所有字符串都仅使用ASCII字符 .
It appears that it is the numpy
array data that is causing the problems here as all strings in the set use ASCII characters only.
另一种选择是使用encoding='bytes'
,但是所有文件名和顶级字典键都是bytes
对象,您必须对其进行解码或在所有键文字前加上b
.
The alternative would by to use encoding='bytes'
but then all the filenames and top-level dictionary keys are bytes
objects and you'd have to decode those or prefix all your key literals with b
.
这篇关于UnicodeDecodeError:'gbk'编解码器无法解码位置0非法多字节序列中的字节0x80的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!