在带cPickle的python 3.7上使用python 2.7代码时出现UnicodeDecodeError [英] UnicodeDecodeError when using python 2.7 code on python 3.7 with cPickle

查看:261
本文介绍了在带cPickle的python 3.7上使用python 2.7代码时出现UnicodeDecodeError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对从已解析" .csv文件构造的.pkl文件使用cPickle.使用预先构建的python工具箱进行解析,该工具箱最近已从python 2移植到python 3( https://github.com/GEMScienceTools/gmpe-smtk )

I am trying to use cPickle on a .pkl file constructed from a "parsed" .csv file. The parsing is undertaken using a pre-constructed python toolbox, which has recently been ported to python 3 from python 2 (https://github.com/GEMScienceTools/gmpe-smtk)

我正在使用的代码如下:

The code I'm using is as follows:

from smtk.parsers.esm_flatfile_parser import ESMFlatfileParser
parser=ESMFlatfileParser.autobuild("Database10","Metadata10","C:/Python37/TestX10","C:/Python37/NorthSea_Inc_SA.csv")
import cPickle
sm_database = cPickle.load(open("C:/Python37/TestX10/metadatafile.pkl","r"))

它返回以下错误:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 44: character maps to <undefined>

根据我的收集,我需要指定.pkl文件的编码以使cPickle正常工作,但我不知道通过解析.csv文件生成的文件上的编码是什么,所以我可以目前不使用cPickle.

From what I can gather, I need to specify the encoding of my .pkl file to enable cPickle to work but I do not know what the encoding is on the file produced from the parsing of the .csv file, so I can't use cPickle to currently do so.

我使用sublime文本软件发现它是十六进制",但是这不是Python 3.7中接受的编码格式,不是吗?

I used the sublime text software to find it is "hexadecimal", but this is not an accepted encoding format in Python 3.7 is it not?

如果有人知道如何确定所需的编码格式,或者如何使十六进制编码在Python 3.7中可用,他们的帮助将不胜感激.

If anyone knows how to determine the encoding format required, or how to make hexadecimal encoding usable in Python 3.7 their help would be much appreciated.

P.s.使用的模块(例如"ESMFlatfileparser")是预先构建的工具箱的一部分.考虑到这一点,我是否还可能需要在此模块内以某种方式更改编码?

P.s. the modules used such as "ESMFlatfileparser" are part of a pre-constructed toolbox. Considering this, is there a chance I may need to alter the encoding in some way within this module also?

推荐答案

代码正在以 text 模式('r')打开文件,但该文件应为 binary 模式('rb').

The code is opening the file in text mode ('r'), but it should be binary mode ('rb').

文档中获取pickle.load(强调我的):

From the documentation for pickle.load (emphasis mine):

[The]文件可以是已打开以进行二进制读取的磁盘文件,io.BytesIO对象或符合此接口的任何其他自定义对象.

[The] file can be an on-disk file opened for binary reading, an io.BytesIO object, or any other custom object that meets this interface.

由于文件是以二进制模式打开的,因此无需为open提供编码参数.可能需要为pickle.load提供编码参数.从同一文档中:

Since the file is being opened in binary mode there is no need to provide an encoding argument to open. It may be necessary to provide an encoding argument to pickle.load. From the same documentation:

可选的关键字参数是fix_imports,编码和错误,用于控制对Python 2生成的pickle流的兼容性支持.如果fix_imports为true,pickle将尝试将旧的Python 2名称映射到Python中使用的新名称. 3.编码和错误告诉pickle如何解码Python 2腌制的8位字符串实例;它们分别默认为"ASCII"和"strict".编码可以是字节",以将这些8位字符串实例读取为bytes对象.要解开NumPy数组和Python 2腌制的日期时间,日期和时间实例,需要使用encoding ='latin1'.

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects. Using encoding='latin1' is required for unpickling NumPy arrays and instances of datetime, date and time pickled by Python 2.

这应该防止UnicodeDecodeError:

sm_database = cPickle.load(open("C:/Python37/TestX10/metadatafile.pkl","rb"))

这篇关于在带cPickle的python 3.7上使用python 2.7代码时出现UnicodeDecodeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆