如何在python中将大型csv文件写入hdf5? [英] How to write a large csv file to hdf5 in python?

查看：534 发布时间：2020/5/24 0:20:09 python pandas hdf5

本文介绍了如何在python中将大型csv文件写入hdf5?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的数据集太大，无法直接读入内存.而且我不想升级机器.根据我的阅读，HDF5可能是解决我的问题的合适方法.但是我不确定如何将数据帧迭代写入HDF5文件，因为我无法将csv文件作为数据帧对象加载.

I have a dataset that is too large to directly read into memory. And I don't want to upgrade the machine. From my readings, HDF5 may be a suitable solution for my problem. But I am not sure how to iteratively write the dataframe into the HDF5 file since I can not load the csv file as a dataframe object.

所以我的问题是如何使用python熊猫将大型CSV文件写入HDF5文件.

So my question is how to write a large CSV file into HDF5 file with python pandas.

推荐答案

您可以使用chunksize参数按块读取CSV文件，并将每个块附加到HDF文件中:

You can read CSV file in chunks using chunksize parameter and append each chunk to the HDF file:

hdf_key = 'hdf_key'
df_cols_to_index = [...] # list of columns (labels) that should be indexed
store = pd.HDFStore(hdf_filename)

for chunk in pd.read_csv(csv_filename, chunksize=500000):
    # don't index data columns in each iteration - we'll do it later ...
    store.append(hdf_key, chunk, data_columns=df_cols_to_index, index=False)
    # index data columns in HDFStore

store.create_table_index(hdf_key, columns=df_cols_to_index, optlevel=9, kind='full')
store.close()

这篇关于如何在python中将大型csv文件写入hdf5?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在python中将大型csv文件写入hdf5? [英] How to write a large csv file to hdf5 in python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在python中将大型csv文件写入hdf5? [英] How to write a large csv file to hdf5 in python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭