如何使用 pandas 存储数据框 [英] How to store a dataframe using Pandas

查看：70 发布时间：2020/5/23 21:20:07 python pandas dataframe

本文介绍了如何使用 pandas 存储数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

现在，每次运行脚本时，我都将一个相当大的CSV作为数据框导入.是否有一个很好的解决方案，可以使数据帧在两次运行之间保持持续可用，因此我不必花所有时间等待脚本运行?

解决方案

最简单的方法是 to_pickle >

df.to_pickle(file_name)  # where to save it, usually as a .pkl

然后您可以使用以下方法将其加载回去:

df = pd.read_pickle(file_name)

注意:在0.11.1 save和load之前是这样做的唯一方法(现在已弃用它们，而分别推荐使用to_pickle和read_pickle).

另一个流行的选择是使用 HDF5 ( pytables )，它提供了食谱.

从0.13开始，还有 msgpack 可能是可以更好地实现互操作性，作为JSON的更快替代品，或者具有python对象/大量文本数据(请参见此问题).

Right now I'm importing a fairly large CSV as a dataframe every time I run the script. Is there a good solution for keeping that dataframe constantly available in between runs so I don't have to spend all that time waiting for the script to run?

解决方案

The easiest way is to pickle it using to_pickle:

df.to_pickle(file_name)  # where to save it, usually as a .pkl

Then you can load it back using:

df = pd.read_pickle(file_name)

Note: before 0.11.1 save and load were the only way to do this (they are now deprecated in favor of to_pickle and read_pickle respectively).

Another popular choice is to use HDF5 (pytables) which offers very fast access times for large datasets:

store = HDFStore('store.h5')

store['df'] = df  # save it
store['df']  # load it

More advanced strategies are discussed in the cookbook.

Since 0.13 there's also msgpack which may be be better for interoperability, as a faster alternative to JSON, or if you have python object/text-heavy data (see this question).

这篇关于如何使用 pandas 存储数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用 pandas 存储数据框 [英] How to store a dataframe using Pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用 pandas 存储数据框 [英] How to store a dataframe using Pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭