使用属性从H5文件过滤HDF数据集 [英] Filter HDF dataset from H5 file using attribute

查看:83
本文介绍了使用属性从H5文件过滤HDF数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个h5文件,其中包含多个组和数据集.每个数据集都有关联的属性.我想根据与之关联的各个属性在此h5文件中查找/过滤数据集.

I have an h5 file containing multiple groups and datasets. Each dataset has associated attributes. I want to find/filter the datasets in this h5 file based upon the respective attribute associated with it.

示例:

dataset1 =cloudy(attribute) 
dataset2 =rainy(attribute)
dataset3 =cloudy(attribute)

我想找到具有weather属性/元数据的数据集为cloudy

I want to find the datasets having weather attribute/metadata as cloudy

pythonic 方式完成此操作的最简单方法是什么.

What will be the simplest approach to get this done in pythonic way.

推荐答案

有两种使用Python访问HDF5数据的方法: h5py pytables . 两者都很好,但功能不同:

There are 2 ways to access HDF5 data with Python: h5py and pytables. Both are good, with different capabilities:

  • h5py (来自h5py常见问题解答):尝试将HDF5功能集映射到NumPy 尽可能紧密.有人说这使h5py更具"pythonic"性.
  • PyTables (来自PyTables常见问题解答):在HDF5和NumPy之上构建一个附加的抽象层.搜索范围更广 功能(与h5py相比).
  • h5py (from h5py FAQ): attempts to map the HDF5 feature set to NumPy as closely as possible. Some say that makes h5py more "pythonic".
  • PyTables (from PyTables FAQ): builds an additional abstraction layer on top of HDF5 and NumPy. It has more extensive search capabilities (compared to h5py).

使用HDF5数据时,了解HDF5数据模型很重要.那超出了这篇文章的范围.为简单起见,请将数据模型视为文件系统;其中组"和数据集"就像文件夹"和文件".两者都可以具有属性. 节点"是用于指代组"或数据集"的术语.

When working with HDF5 data, it is important to understand the HDF5 data model. That goes beyond the scope of this post. For simplicity sake, think of the data model as a file system; where "groups" and "datasets" are like "folders" and "files". Both can have attributes. "node" is the term used to refer to a "group" or "dataset".

@Kiran Ramachandra用h5py概述了一种方法.由于您使用pytables标记了帖子,因此下面概述的操作与pytables相同.

@Kiran Ramachandra outlined a method with h5py. Since you tagged your post with pytables, outlined below is the same process with pytables.

注意:Kiran的示例假定数据集1,2,3都在根级别.你说你也有团体.可能您的小组也有一些数据集.您可以使用 HDFView 实用程序来查看数据模型和您的数据.

Note: Kiran's example assumes datasets 1,2,3 are all at the root level. You said you also have groups. Likely your groups also have some datasets. You can use the HDFView utility to view the data model and your data.

import tables as tb
h5f = tb.open_file('a.h5')

这为您提供了一个用于访问其他对象(组或数据集)的文件对象.

This gives you a file object you use to access additional objects (groups or datasets).

h5f.walk_nodes() 

它是节点和子节点的可迭代对象,并提供完整的HDF5数据结构(记住节点"可以是组和数据集).您可以使用以下命令列出所有节点和类型:

It is an iterable object to nodes and subnodes, and gives the complete HDF5 data structure (remember "nodes" can be either groups and datasets). You can list all node and types with:

for anode in h5f.walk_nodes() :
    print (anode)

使用以下代码获取(非递归)节点名称的Python列表:

Use the following to get (a non-recursive) Python List of node names:

h5f.list_nodes() 

这将从dataset1中获取属性cloudy的值(如果存在):

This will fetch the value of attribute cloudy from dataset1 (if it exists):

h5f.root.dataset1._f_getattr('cloudy')

如果需要节点的所有属性,请使用此属性(显示为dataset1):

If you want all attributes for a node, use this (shown for dataset1):

ds1_attrs = h5f.root.dataset1._v_attrs._v_attrnames
for attr_name in ds1_attrs :
   print ('Attribute',  attr_name,'=' ,h5f.root.dataset1._f_getattr(attr_name))

以上所有引用均在根级别(h5f.root)中引用dataset1. 如果数据集在组中,则只需将组名添加到路径中. 对于名为agroup的组中的dataset2,请使用:

All of the above references dataset1 at the root level (h5f.root). If a data set is in a group, you simply add the group name to the path. For dataset2 in group named agroup, use:

h5f.root.agroup.dataset2._f_getattr('rainy')

这将从agroup中的dataset2中获取属性rainy的值(如果存在)

This will fetch the value of attribute rainy from dataset2 in agroup (if it exists)

如果要使用dataset2的所有属性:

If you want all attributes for dataset2:

ds2_attrs = h5f.root.agroup.dataset2._v_attrs._v_attrnames
for attr_name in ds2_attrs :
   print ('Attribute',  attr_name,'=' , h5f.root.agroup.dataset2._f_getattr(attr_name))

为完整起见,下面随附的是在我的示例中使用的创建a.h5 的代码.创建表时仅需要numpy来定义dtype.通常,HDF5文件是可互换的(因此您可以使用h5py打开此示例).

For completeness, enclosed below is the code to create a.h5 used in my example. numpy is only required to define the dtype when creating the table. In general, HDF5 files are interchangeable (so you can open this example with h5py).

import tables as tb
import numpy as np
h5f = tb.open_file('a.h5','w')

#create dataset 1 at root level, and assign attribute
ds_dtype = np.dtype([('a',int),('b',float)])
dataset1 = h5f.create_table(h5f.root, 'dataset1', description=ds_dtype)
dataset1._f_setattr('cloudy', 'True')

#create a group at root level
h5f.create_group(h5f.root, 'agroup')

#create dataset 2,3 at root.agroup level, and assign attributes
dataset2 = h5f.create_table(h5f.root.agroup, 'dataset2', description=ds_dtype)
dataset2._f_setattr('rainy', 'True')
dataset3 = h5f.create_table(h5f.root.agroup, 'dataset3', description=ds_dtype)
dataset3._f_setattr('cloudy', 'True')

h5f.close()

这篇关于使用属性从H5文件过滤HDF数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆