UnicodeDecodeError:'gbk'编解码器无法解码位置0非法多字节序列中的字节0x80 [英] UnicodeDecodeError:'gbk' codec can't decode byte 0x80 in position 0 illegal multibyte sequence

查看:2282
本文介绍了UnicodeDecodeError:'gbk'编解码器无法解码位置0非法多字节序列中的字节0x80的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Win 7 64位系统上使用python 3.4.我运行了以下代码:

I use python 3.4 with win 7 64-bit system. I ran the following code:

      6   """ load single batch of cifar """
      7   with open(filename, 'r') as f:
----> 8     datadict = pickle.load(f)
      9     X = datadict['data']

错误信息是UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 0: illegal multibyte sequence

我将第7行更改为:

      6   """ load single batch of cifar """
      7   with open(filename, 'r',encoding='utf-8') as f:
----> 8     datadict = pickle.load(f)
      9     X = datadict['data']

错误信息变成UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte.

消息最终指向解码(自己,输入,最终)中的Python34 \ lib \ codecs.py.

The message finally points to the Python34\lib\codecs.py in decode(self, input, final).

    311         # decode input (taking the buffer into account)
    312         data = self.buffer + input
--> 313         (result, consumed) = self._buffer_decode(data, self.errors, final)
    314         # keep undecoded input until the next call
    315         self.buffer = data[consumed:]

我进一步将代码更改为:

I further changed the code as:

      6 """ load single batch of cifar """ 
      7 with open(filename, 'rb') as f:
----> 8 datadict = pickle.load(f) 
      9 X = datadict['data'] 10 Y = datadict['labels']

好吧,这次是UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128).

问题是什么以及如何解决?

What is the problem and how to solve it?

推荐答案

修补程序文件是二进制数据文件,因此在加载时始终必须使用'rb'模式打开文件.不要在这里尝试使用文本模式.

Pickle files are binary data files, so you always have to open the file with the 'rb' mode when loading. Don't try to use a text mode here.

您正试图加载包含字符串数据的Python 2泡菜.您必须告诉pickle.load()如何将数据转换为Python 3字符串,或将其保留为字节.

You are trying to load a Python 2 pickle that contains string data. You'll have to tell pickle.load() how to convert that data to Python 3 strings, or to leave them as bytes.

默认设置是尝试将这些字符串解码为ASCII,并且解码失败.请参见 pickle.load()文档:

The default is to try and decode those strings as ASCII, and that decoding fails. See the pickle.load() documentation:

可选的关键字参数是 fix_imports encoding errors ,它们用于控制对Python 2生成的pickle流的兼容性支持. fix_imports 是正确的,pickle会尝试将旧的Python 2名称映射到Python 3中使用的新名称. encoding errors 告诉pickle如何解码Python 2腌制的8位字符串实例;它们分别默认为"ASCII"和"strict". encoding 可以是字节",以将这些8位字符串实例读取为字节对象.

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

将编码设置为latin1允许您直接导入数据:

Setting the encoding to latin1 allows you to import the data directly:

with open(filename, 'rb') as f:
    datadict = pickle.load(f, encoding='latin1') 

似乎是numpy数组数据在这里引起了问题,因为该集中的所有字符串都仅使用ASCII字符 .

It appears that it is the numpy array data that is causing the problems here as all strings in the set use ASCII characters only.

另一种选择是使用encoding='bytes',但是所有文件名和顶级字典键都是bytes对象,您必须对其进行解码或在所有键文字前加上b.

The alternative would by to use encoding='bytes' but then all the filenames and top-level dictionary keys are bytes objects and you'd have to decode those or prefix all your key literals with b.

这篇关于UnicodeDecodeError:'gbk'编解码器无法解码位置0非法多字节序列中的字节0x80的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆