高效地将大 pandas 数据帧写入磁盘 [英] Efficiently writing large Pandas data frames to disk
问题描述
我正在尝试找到使用Python/Pandas有效地将大型数据帧(250MB +)写入磁盘或从磁盘写入数据的最佳方法.我已经尝试了用于数据分析的Python 中的所有方法,但是性能却非常令人失望.
I am trying to find the best way to efficiently write large data frames (250MB+) to and from disk using Python/Pandas. I've tried all of the methods in Python for Data Analysis, but the performance has been very disappointing.
这是一个较大的项目的一部分,该项目探讨了将我们当前的分析/数据管理环境从Stata迁移到Python.当我将测试中的读写时间与使用Stata时的读写时间进行比较时,Python和Pandas通常要花费20倍以上的时间.
This is part of a larger project exploring migrating our current analytic/data management environment from Stata to Python. When I compare the read/write times in my tests to those that I get with Stata, Python and Pandas are typically taking more than 20 times as long.
我强烈怀疑我是问题所在,而不是Python或Pandas.
I strongly suspect that I am the problem, not Python or Pandas.
有什么建议吗?
推荐答案
使用HDFStore
是您最好的选择(本书没有涉及很多,并且已经做了很多更改).您会发现性能比任何其他序列化方法都要好.
Using HDFStore
is your best bet (not covered very much in the book, and has changed quite a lot). You will find performance is MUCH better than any other serialization method.
这篇关于高效地将大 pandas 数据帧写入磁盘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!