如何以 PANDAS-LOADABLE 二进制格式存储 `pandas.DataFrame`,而不是 `pickle` [英] How to store `pandas.DataFrame` in a PANDAS-LOADABLE binary format other than `pickle`
问题描述
我在保存 pandas.DataFrame
(1 440 000 000 行)时遇到问题.
I have a problem with saving pandas.DataFrame
(1 440 000 000 rows).
从我在 API 中看到的,存储(然后加载)数组的唯一可用选项是 CSV 或 pickle.
From what I can see in the API, the only available options to store (and then load) the array are either CSV or pickle.
以pickle格式保存以一个神秘的异常结束(SystemError: error return without exception set
),而保存在CSV中即使被压缩(2字节长np.float16
比 ASCII 编码的值高效得多).
Saving in pickle format ends with a mysterious exception (SystemError: error return without exception set
), while saving in CSV is a waste of space even if it is compressed (2-byte-long np.float16
is much more efficient than ASCII-encoded value).
如何以可加载、节省内存(包括磁盘空间)的格式存储我的数据帧?
How can I store my dataframe in a loadable, memory-efficient (including disk space) format?
推荐答案
我猜你的数据框太大了.泡菜有一些限制.最好是保存在数据库中或使用 to_hdf(或许多其他 IO 例程,to_msgpack 也可能有效).
I would guess that your data frame is too big. Pickle has some limits. You are much better off either saving in a database or using to_hdf (or lots of other IO routines, to_msgpack might works as well).
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html
这篇关于如何以 PANDAS-LOADABLE 二进制格式存储 `pandas.DataFrame`,而不是 `pickle`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!