将嵌套的.h5组读入numpy数组 [英] reading nested .h5 group into numpy array

查看：102 发布时间：2021/4/9 20:13:05 python arrays numpy hdf5 h5py

本文介绍了将嵌套的.h5组读入numpy数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我从朋友那里收到了这个.h5文件，我需要使用其中的数据来做一些工作.所有数据均为数值.这是我第一次使用这类文件.我在这里找到了许多有关阅读这些文件的问题和答案，但是我找不到找到该文件包含的较低级别的组或文件夹的方法.该文件包含两个主文件夹，即X和YX包含一个名为0的文件夹，其中包含两个名为A和B的文件夹.Y包含十个名为1-10的文件夹.我要读取的数据在A，B，1,2，..，10中例如我以

I received this .h5 file from a friend and I need to use the data in it for some work. All the data is numerical. This the first time I work with these kind of files. I found many questions and answers here about reading these files but I couldn't find a way to get to lower level of the groups or folders the file contains. The file contains two main folders, i.e. X and Y X contains a folder named 0 which contains two folders named A and B. Y contains ten folders named 1-10. The data I want to read is in A,B,1,2,..,10 for instance I start with

f = h5py.File(filename, 'r')
f.keys()

现在f返回 [u'X'，u'Y'] 两个主文件夹

Now f returns [u'X', u'Y'] The two main folders

然后我尝试使用read_direct读取X和Y，但出现错误

Then I try to read X and Y using read_direct but I get the error

AttributeError:'Group'对象没有属性'read_direct'

我尝试如下创建X和Y的对象

I try to create an object for X and Y as follows

obj1 = f['X']

obj2 = f['Y']

然后，如果我使用

obj1.shape
obj1.dtype

我收到错误

AttributeError:组"对象没有属性形状"

我可以看到这些命令不起作用，因为然后在X和Y上使用了，这两个文件夹不包含任何数据，但包含其他文件夹.

AttributeError: 'Group' object has no attribute 'shape'

I can see that these command don't work because I use then on X and Y which are folders contains no data but other folders.

所以我的问题是如何进入名为A，B，1-10的文件夹以读取数据

So my question is how to get down to the folders named A, B,1-10 to read the data

即使在文档 http://docs.h5py.org/en/latest/quick.html

推荐答案

您需要遍历HDF5层次结构，直到到达数据集.组没有形状或类型，数据集没有.

You need to traverse down your HDF5 hierarchy until you reach a dataset. Groups do not have a shape or type, datasets do.

假设您事先不了解层次结构，则可以使用递归算法通过迭代器以 group1/group2/.../dataset的形式生成指向所有可用数据集的完整路径代码>.下面是一个示例.


Assuming you do not know your hierarchy structure in advance, you can use a recursive algorithm to yield, via an iterator, full paths to all available datasets in the form group1/group2/.../dataset. Below is an example.
import h5py

def traverse_datasets(hdf_file):

    def h5py_dataset_iterator(g, prefix=''):
        for key in g.keys():
            item = g[key]
            path = f'{prefix}/{key}'
            if isinstance(item, h5py.Dataset): # test for dataset
                yield (path, item)
            elif isinstance(item, h5py.Group): # test for group (go down)
                yield from h5py_dataset_iterator(item, path)

    for path, _ in h5py_dataset_iterator(hdf_file):
        yield path

例如，您可以迭代所有您感兴趣的数据集路径和输出属性:
You can, for example, iterate all dataset paths and output attributes which interest you:
with h5py.File(filename, 'r') as f:
    for dset in traverse_datasets(f):
        print('Path:', dset)
        print('Shape:', f[dset].shape)
        print('Data type:', f[dset].dtype)

请记住，默认情况下，HDF5中的阵列不会完全在内存中读取.您可以通过 arr = f [dset] [:] 读入内存，其中 dset 是完整路径.
Remember that, by default, arrays in HDF5 are not read entirely in memory. You can read into memory via arr = f[dset][:], where dset is the full path.

                        这篇关于将嵌套的.h5组读入numpy数组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

将嵌套的.h5组读入numpy数组 [英] reading nested .h5 group into numpy array

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将嵌套的.h5组读入numpy数组 [英] reading nested .h5 group into numpy array

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭