如何将Pandas DataFrame存储为HDF5 PyTables表(或CArray,EArray等)? [英] How does one store a Pandas DataFrame as an HDF5 PyTables table (or CArray, EArray, etc.)?
问题描述
我有以下熊猫数据框:
import pandas as pd
df = pd.read_csv(filename.csv)
现在,我可以使用HDFStore
将df
对象写入文件(例如将键值对添加到Python字典中):
Now, I can use HDFStore
to write the df
object to file (like adding key-value pairs to a Python dictionary):
store = HDFStore('store.h5')
store['df'] = df
http://pandas.pydata.org/pandas-docs/stable/io.html
当我查看内容时,该对象是frame
.
When I look at the contents, this object is a frame
.
store
输出
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df frame (shape->[552,23252])
但是,为了使用索引,应该将其存储为table
对象.
However, in order to use indexing, one should store this as a table
object.
我的方法是尝试HDFStore.put()
即
HDFStore.put(key="store.h", value=df, format=Table)
但是,此操作失败并显示以下错误:
However, this fails with the error:
TypeError: put() missing 1 required positional argument: 'self'
如何将Pandas Dataframe保存为PyTables表?
How does one save Pandas Dataframes as PyTables tables?
推荐答案
常见部分-创建或打开现有的HDFStore文件:
common part - create or open existing HDFStore file:
store = pd.HDFStore('store.h5')
如果要为所有列建立索引,请尝试以下操作:
Try this if you want to have indexed all columns:
store.append('key_name', df, data_columns=True)
或者如果您只想索引一部分列,则执行以下操作:
or this if you want to have indexed just a subset of columns:
store.append('key_name', df, data_columns=['colA','colC','colN'])
PS HDFStore.append()
默认以table
格式保存DF
PS HDFStore.append()
saves DFs per default in table
format
这篇关于如何将Pandas DataFrame存储为HDF5 PyTables表(或CArray,EArray等)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!