如何将大于RAM限制的gzip文件导入Pandas DataFrame？ “杀死9”使用HDF5？ [英] How to import a gzip file larger than RAM limit into a Pandas DataFrame? "Kill 9" Use HDF5?

查看：421 发布时间：2017/3/25 23:34:15 python pandas dataframe gzip hdf5

本文介绍了如何将大于RAM限制的gzip文件导入Pandas DataFrame？ “杀死9”使用HDF5？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个大约90 GB的 gzip 。这在磁盘空间中很好，但远远大于RAM。

I have a gzip which is approximately 90 GB. This is well within disk space, but far larger than RAM.

如何将其导入熊猫数据框？我在命令行中尝试了以下操作：

How can I import this into a pandas dataframe? I tried the following in the command line:

# start with Python 3.4.5
import pandas as pd
filename = 'filename.gzip'   # size 90 GB
df = read_table(filename, compression='gzip')

然而，几分钟后，Python关闭了$ code> Kill 9 。

However, after several minutes, Python shuts down with Kill 9.

定义数据库对象 df 后，我打算将其保存到HDF5中。

After defining the database object df, I was planning to save it into HDF5.

正确的方法是什么？如何使用 pandas.read_table（）来执行此操作？

What is the correct way to do this? How can I use pandas.read_table() to do this?

推荐答案

我会这样做：

filename = 'filename.gzip'      # size 90 GB
hdf_fn = 'result.h5'
hdf_key = 'my_huge_df'
cols = ['colA','colB','colC','ColZ'] # put here a list of all your columns
cols_to_index = ['colA','colZ'] # put here the list of YOUR columns, that you want to index
chunksize = 10**6               # you may want to adjust it ... 

store = pd.HDFStore(hdf_fn)

for chunk in pd.read_table(filename, compression='gzip', header=None, names=cols, chunksize=chunksize):
    # don't index data columns in each iteration - we'll do it later
    store.append(hdf_key, chunk, data_columns=cols_to_index, index=False)

# index data columns in HDFStore
store.create_table_index(hdf_key, columns=cols_to_index, optlevel=9, kind='full')
store.close()

这篇关于如何将大于RAM限制的gzip文件导入Pandas DataFrame？ “杀死9”使用HDF5？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将大于RAM限制的gzip文件导入Pandas DataFrame？ “杀死9”使用HDF5？ [英] How to import a gzip file larger than RAM limit into a Pandas DataFrame? "Kill 9" Use HDF5?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将大于RAM限制的gzip文件导入Pandas DataFrame？ “杀死9”使用HDF5？ [英] How to import a gzip file larger than RAM limit into a Pandas DataFrame? &quot;Kill 9&quot; Use HDF5?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

如何将大于RAM限制的gzip文件导入Pandas DataFrame？ “杀死9”使用HDF5？ [英] How to import a gzip file larger than RAM limit into a Pandas DataFrame? "Kill 9" Use HDF5?

登录关闭