如何将Pandas DataFrame存储为HDF5 PyTables表(或CArray,EArray等)? [英] How does one store a Pandas DataFrame as an HDF5 PyTables table (or CArray, EArray, etc.)?

查看:1461
本文介绍了如何将Pandas DataFrame存储为HDF5 PyTables表(或CArray,EArray等)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下熊猫数据框:

import pandas as pd
df = pd.read_csv(filename.csv)

现在,我可以使用HDFStoredf对象写入文件(例如将键值对添加到Python字典中):

Now, I can use HDFStore to write the df object to file (like adding key-value pairs to a Python dictionary):

store = HDFStore('store.h5')
store['df'] = df

http://pandas.pydata.org/pandas-docs/stable/io.html

当我查看内容时,该对象是frame.

When I look at the contents, this object is a frame.

store 

输出

<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[552,23252])

但是,为了使用索引,应该将其存储为table对象.

However, in order to use indexing, one should store this as a table object.

我的方法是尝试HDFStore.put()

HDFStore.put(key="store.h", value=df, format=Table)

但是,此操作失败并显示以下错误:

However, this fails with the error:

TypeError: put() missing 1 required positional argument: 'self'

如何将Pandas Dataframe保存为PyTables表?

How does one save Pandas Dataframes as PyTables tables?

推荐答案

常见部分-创建或打开现有的HDFStore文件:

common part - create or open existing HDFStore file:

store = pd.HDFStore('store.h5')

如果要为所有列建立索引,请尝试以下操作:

Try this if you want to have indexed all columns:

store.append('key_name', df, data_columns=True)

或者如果您只想索引一部分列,则执行以下操作:

or this if you want to have indexed just a subset of columns:

store.append('key_name', df, data_columns=['colA','colC','colN'])

PS HDFStore.append()默认以table格式保存DF

PS HDFStore.append() saves DFs per default in table format

这篇关于如何将Pandas DataFrame存储为HDF5 PyTables表(或CArray,EArray等)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆