如何以 PANDAS-LOADABLE 二进制格式存储 `pandas.DataFrame`,而不是 `pickle` [英] How to store `pandas.DataFrame` in a PANDAS-LOADABLE binary format other than `pickle`

查看:78
本文介绍了如何以 PANDAS-LOADABLE 二进制格式存储 `pandas.DataFrame`,而不是 `pickle`的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在保存 pandas.DataFrame(1 440 000 000 行)时遇到问题.

I have a problem with saving pandas.DataFrame (1 440 000 000 rows).

从我在 API 中看到的,存储(然后加载)数组的唯一可用选项是 CSV 或 pickle.

From what I can see in the API, the only available options to store (and then load) the array are either CSV or pickle.

以pickle格式保存以一个神秘的异常结束(SystemError: error return without exception set),而保存在CSV中即使被压缩(2字节长np.float16 比 ASCII 编码的值高效得多).

Saving in pickle format ends with a mysterious exception (SystemError: error return without exception set), while saving in CSV is a waste of space even if it is compressed (2-byte-long np.float16 is much more efficient than ASCII-encoded value).

如何以可加载、节省内存(包括磁盘空间)的格式存储我的数据帧?

How can I store my dataframe in a loadable, memory-efficient (including disk space) format?

推荐答案

我猜你的数据框太大了.泡菜有一些限制.最好是保存在数据库中或使用 to_hdf(或许多其他 IO 例程,to_msgpack 也可能有效).

I would guess that your data frame is too big. Pickle has some limits. You are much better off either saving in a database or using to_hdf (or lots of other IO routines, to_msgpack might works as well).

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html

这篇关于如何以 PANDAS-LOADABLE 二进制格式存储 `pandas.DataFrame`,而不是 `pickle`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆