列出HDF5组中的数据集 [英] Listing datasets in a group in HDF5
问题描述
我决定使用其分层结构将数据存储在HDF5中,而不是依赖于文件系统. 不幸的是,我遇到了性能问题.
I decided to store my data in HDF5 using its hierarchical structure instead of relying on the filesystem. Unfortunately, I'm having performance issues.
我的数据格式如下: 我有大约70个顶级组,分别对应于日期,每个组包含大约8000个数据集. 我想查看每天的数据集数量的列表:
My data is formatted as follows: I have about 70 top level groups, corresponding to dates and each of them contain roughly 8000 datasets. I would like to see a list of the number of datasets per day:
for date in hdf5.keys():
print(len(hdf5[date]))
每次迭代需要2秒以上的时间,我感到有些沮丧.
I'm finding it a little frustrating that this takes 2+ second/iteration.
另外,我有两个具有上述布局的hdf5文件,而更大的文件则慢得多.
Also, I have two different hdf5 files with the above layout and the bigger one is much slower at this.
我在做什么错了?
推荐答案
尝试使用libver最新标记创建文件:
Try creating the file with the libver latest flag:
f = h5py.File('name.hdf5', libver='latest')
如果每个组有很多数据集或每个数据集有很多属性,则速度会更快.
This will be much faster if you have a lot of datasets per group or attributes per dataset.
这篇关于列出HDF5组中的数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!