泡菜0.14.1和0.15.2的 pandas 向后兼容性问题 [英] Pandas backwards compatibility issue with pickle 0.14.1 and 0.15.2

查看:345
本文介绍了泡菜0.14.1和0.15.2的 pandas 向后兼容性问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们将pandas Dataframe用作时间序列数据的主要数据容器.我们将数据帧打包成二进制blob,然后打包到mongoDB文档中,以存储有关时间序列blob的元数据.

We're using pandas Dataframe as our primary data container for our time series data. We pack the dataframe into binary blobs into a mongoDB document for storage along with keys for meta data about the time series blob.

从熊猫0.14.1升级到0.15.2时,我们遇到了错误.

We ran into an error when we upgraded from pandas 0.14.1 to 0.15.2.

创建pandas Dataframe(0.14.1)的二进制Blob

Create binary blob of pandas Dataframe (0.14.1)

import lz4   
import cPickle

bd = lz4.compress(cPickle.dumps(df,cPickle.HIGHEST_PROTOCOL))

错误案例:使用熊猫0.15.2从mongoDB读回

Error Case: Read back in from mongoDB with pandas 0.15.2

cPickle.loads(lz4.decompress(bd))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-76f7b0b41426> in <module>()
----> 1 cPickle.loads(lz4.decompress(bd))
TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function _reconstruct>, (<class 'pandas.core.index.Index'>, (0,), 'b'))

成功案例:使用熊猫0.14.1从mongoDB读回,没有错误.

Success Case: Read back in from mongoDB with pandas 0.14.1 with no error.

这似乎类似于旧的堆栈线程从来源:默认的泡菜行为已更改 带有 https://stackoverflow.com/users/644898/jeff

This seems to be similar to an old stack thread Pandas compiled from source: default pickle behavior changed With a helpful comment from https://stackoverflow.com/users/644898/jeff

您看到的错误消息`TypeError:_reconstruct:第一个参数 必须是ndarray的子​​类型是python默认的unpickler 确保被腌制的类层次结构恰好是 同样,它正在重新创建.由于系列在版本之间已更改 使用默认的取消选取程序将不再可能,(此恕我直言是 泡菜工作方式中的错误).无论如何,大熊猫都会脱皮 具有系列对象的0.13之前的泡菜."

The error message you are seeing `TypeError: _reconstruct: First argument must be a sub-type of ndarray is that the python default unpickler makes sure that the class hierarchy that was pickled is exactly the same what it is recreating. Since Series has changed between versions this is no longer possible with the default unpickler, (this IMHO is a bug in the way pickle works). In any event, pandas will unpickle pre-0.13 pickles that have Series objects."

有任何解决方法或解决方案的想法吗?

Any ideas on workaround or solutions?

要重新创建错误:

在熊猫0.14.1环境中的设置:

Setup in pandas 0.14.1 env:

df = pd.DataFrame(np.random.randn(10,10))
cPickle.dump(df,open("cp0141.p","wb"))
cPickle.load(open('cp0141.p','r')) # no error

在熊猫0.15.2 env中创建错误:

Create error in pandas 0.15.2 env:

cPickle.load(open('cp0141.p','r'))
TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function_reconstruct>, (<class 'pandas.core.index.Int64Index'>, (0,), 'b'))

推荐答案

这已明确提及,因为Index类现在不再是子类ndarray,而不再是熊猫对象,请参见

This was explicity mentioned as the Index class now no-longer sub-classes ndarray but a pandas object, see here.

您只需要使用pd.read_pickle来读取泡菜即可.

You simply need to use pd.read_pickle to read the pickles.

这篇关于泡菜0.14.1和0.15.2的 pandas 向后兼容性问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆