将具有层次结构的多个pd.DataFrames保存到hdf5 [英] save multiple pd.DataFrames with hierarchy to hdf5

查看:58
本文介绍了将具有层次结构的多个pd.DataFrames保存到hdf5的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个具有分层组织的pd.DataFrames.假设我有:

I have multiple pd.DataFrames which have hierarchical organization. Let's say I have:

day_temperature_london_df = pd.DataFrame(...)
night_temperature_london_df = pd.DataFrame(...)

day_temperature_paris_df = pd.DataFrame(...)
night_temperature_paris_df = pd.DataFrame(...)

我想将它们分组为hdf5文件,以便其中两个进入伦敦"组,另外两个进入巴黎"组.

And I want to group them into hdf5 file so two of them go to group 'london' and two of others go to 'paris'.

如果我使用h5py,则会丢失 pd.DataFrame 的格式,并丢失索引和列.

If I use h5py I lose the format of the pd.DataFrame, lose indexes and columns.

f = h5py.File("temperature.h5", "w")
grp_london = f.create_group("london")
day_lon_dset = grp_london.create_dataset("day", data=day_temperature_london_df)
print day_lon_dset[...]

这给了我一个numpy数组.有没有一种方法可以以与 .to_hdf 相同的方式来存储具有层次结构的许多数据框-它保留了数据框的所有属性?

This gives me a numpy array. Is there a way to store many dataframes with hierarchy in the same way .to_hdf does - it keeps all the properties of the dataframe?

推荐答案

比起 pandas ,我对 numpy h5py 更加熟悉.但是我能够创建:

I'm more familiar with numpy and h5py than pandas. But I was able to create:

In [85]: store = pd.HDFStore('store.h5')
In [86]: store.root
Out[86]: 
/ (RootGroup) ''
  children := []
In [87]: store['df1']=df1
In [88]: store['group/df1']=df1
In [89]: store['group/df2']=df2

可以重新加载并查看:

In [95]: store
Out[95]: 
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df1                  frame        (shape->[3,4])
/group/df1            frame        (shape->[3,4])
/group/df2            frame        (shape->[5,6])

In [96]: store.root
Out[96]: 
/ (RootGroup) ''
  children := ['df1' (Group), 'group' (Group)]

store._handle 详细显示文件结构.

在外壳中,我还可以使用以下命令查看文件:

In a shell I can also look at the file with:

1431:~/mypy$ h5dump store.h5 |less

以下情况:

我应如何将h5py lib用于存储时间序列数据

In [4]: f1 = h5py.File('store.h5')
In [5]: list(f1.keys())
Out[5]: ['df1', 'group']
In [6]: list(f1['df1'].keys())
Out[6]: ['axis0', 'axis1', 'block0_items', 'block0_values']

In [10]: list(f1['group'].keys())
Out[10]: ['df1', 'df2']
In [11]: list(f1['group/df1'].keys())
Out[11]: ['axis0', 'axis1', 'block0_items', 'block0_values']
In [12]: list(f1['group/df2'].keys())
Out[12]: ['axis0', 'axis1', 'block0_items', 'block0_values']

因此,"group/df2"键等效于组的层次结构:

So the `group/df2' key is equivalent to a hierarchy of groups:

In [13]: gp = f1['group']
In [15]: gp['df2']['axis0']
Out[15]: <HDF5 dataset "axis0": shape (6,), type "<i8">
[17]: f1['group/df2/axis0']
Out[17]: <HDF5 dataset "axis0": shape (6,), type "<i8">

我们必须深入研究 HDFStore Pytables 的文档或代码,以查看它们是否具有与 create_group 等效的文档或代码.

We'd have to dig more into the docs or code of HDFStore or Pytables to see if they have an equivalent of create_group.

这篇关于将具有层次结构的多个pd.DataFrames保存到hdf5的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆