从源头编译的 pandas :默认的泡菜行为已更改 [英] Pandas compiled from source: default pickle behavior changed

查看:78
本文介绍了从源头编译的 pandas :默认的泡菜行为已更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚从源代码编译并安装了熊猫(克隆的github repo,>>> setup.py install).

I've just compiled and installed pandas from source (cloned github repo, >>> setup.py install).

碰巧,模块pickle的对象序列化/反序列化的默认行为发生了变化,可能已被熊猫内部模块部分覆盖.

It happened that the default behavior of module pickle for object serialization/deserialization changed being likely partially overridden by pandas internal modules.

我有很多数据类是通过标准" pickle序列化的,显然我不能再反序列化了;特别是,当我尝试反序列化类文件(确定可以正常工作)时,出现此错误

I have quite some data classes serialized via "standard" pickle which apparently I cannot deserialize anymore; in particular, when I try to deserialize a class file (surely working), I get this error

In [1]: import pickle

In [2]: pickle.load(open('pickle_L1cor_s1.pic','rb'))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-88719f8f9506> in <module>()
----> 1 pickle.load(open('pickle_L1cor_s1.pic','rb'))

/home/acorbe/Canopy/appdata/canopy-1.1.0.1371.rh5-x86_64/lib/python2.7/pickle.pyc in load(file)
   1376
   1377 def load(file):
-> 1378     return Unpickler(file).load()
   1379
   1380 def loads(str):

/home/acorbe/Canopy/appdata/canopy-1.1.0.1371.rh5-x86_64/lib/python2.7/pickle.pyc in load(self)
    856             while 1:
    857                 key = read(1)
--> 858                 dispatch[key](self)
    859         except _Stop, stopinst:
    860             return stopinst.value

/home/acorbe/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.12.0_1090_g46008ec-py2.7-linux-x86_64.egg/pandas/compat/pickle_compat.pyc in load_reduce(self)
     28
     29         # try to reencode the arguments
---> 30         if self.encoding is not None:
     31             args = tuple([ arg.encode(self.encoding) if isinstance(arg, string_types)     else arg for arg in args ])
     32             try:

AttributeError: Unpickler instance has no attribute 'encoding'

我有大量依赖此代码的代码崩溃了.有没有快速的解决方法?如何再次获得默认的泡菜行为?

I have quite a large code relying on this which broke down. Is there any quick workaround? How can I obtain again default pickle behavior?

任何帮助表示赞赏

我意识到我愿意揭发的是一系列字典,每个字典都包含几个DataFrames.那就是大熊猫发挥作用的地方.

I realized that what I am willing to unpickle is a list of dicts which include a couple of DataFrames each. That's where pandas comes into play.

我通过@Jeff github.com/pydata/pandas/pull/5661应用了补丁. 出现另一个错误(可能与有关).

I applied the patch by @Jeff github.com/pydata/pandas/pull/5661. Another error (maybe related to this) shows up.

In [4]: pickle.load(open('pickle_L1cor_s1.pic','rb'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-88719f8f9506> in <module>()
----> 1 pickle.load(open('pickle_L1cor_s1.pic','rb'))

/home/acorbe/Canopy/appdata/canopy-1.1.0.1371.rh5-x86_64/lib/python2.7/pickle.pyc in load(file)
   1376
   1377 def load(file):
-> 1378     return Unpickler(file).load()
   1379
   1380 def loads(str):

/home/acorbe/Canopy/appdata/canopy-1.1.0.1371.rh5-x86_64/lib/python2.7/pickle.pyc in load(self)
    856             while 1:
    857                 key = read(1)
--> 858                 dispatch[key](self)
    859         except _Stop, stopinst:
    860             return stopinst.value

/home/acorbe/Canopy/appdata/canopy-1.1.0.1371.rh5-x86_64/lib/python2.7/pickle.pyc in             load_reduce(self)
   1131         args = stack.pop()
   1132         func = stack[-1]
-> 1133         value = func(*args)
   1134         stack[-1] = value
   1135     dispatch[REDUCE] = load_reduce

TypeError: _reconstruct: First argument must be a sub-type of ndarray

熊猫数据的编码版本是(来自Canopy包管理器)

Pandas version of encoded data is (from Canopy package manager)

Size: 7.32 MB
Version: 0.12.0
Build: 2
Dependencies:
 numpy 1.7.1
 python_dateutil
 pytz 2011n

  md5: 7dd4385bed058e6ac15b0841b312ae35

我不确定是否可以提供我尝试破解的文件的最小示例. 它们非常大(O(100MB)),并且具有一些非平凡的依赖关系.

I am not sure I can provide minimal example of the files I am trying to unpickle. They are quite large (O(100MB)) and they have some non trivial dependencies.

推荐答案

Master刚刚通过此 issue 更新了>.

Master has just been updated by this issue.

此文件的读取方式是:

 result = pd.read_pickle('pickle_L1cor_s1.pic')

被腌制的对象是熊猫< = 0.12版本.这需要一个自定义的解酸器, 0.13/master(不久将释放)句柄. 0.13看到了对Series继承层次结构的重构,其中Series不再是ndarray的子类,而现在是NDFrame的子类,而NDFrameDataFramePanel的相同基类.这样做有很多原因,主要是为了提高代码的一致性.有关更完整的说明,请参见此处.

The objects that are pickled are pandas <= 0.12 versioned. This need a custom unpickler, which the 0.13/master (releasing shortly) handles. 0.13 saw a refactor of the Series inheritance hierarchy where Series is no longer a sub-class of ndarray, but now of NDFrame, the same base class of DataFrame and Panel. This was done for a great many reasons, mainly to promote code consistency. See here for a more complete description.

您看到的`TypeError: _reconstruct: First argument must be a sub-type of ndarray错误消息是python默认取消拾取器确保了被腌制的类层次结构与重新创建的完全相同.由于系列在版本之间进行了更改,因此默认的解酸器将不再可能(此恕我直言是腌菜工作方式中的一个错误).无论如何,大熊猫都会释放具有系列对象的0.13年前的泡菜.

The error message you are seeing `TypeError: _reconstruct: First argument must be a sub-type of ndarray is that the python default unpickler makes sure that the class hierarchy that was pickled is exactly the same what it is recreating. Since Series has changed between versions this is no longer possible with the default unpickler, (this IMHO is a bug in the way pickle works). In any event, pandas will unpickle pre-0.13 pickles that have Series objects.

这篇关于从源头编译的 pandas :默认的泡菜行为已更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆