如何使用h5py将数据写入复合数据? [英] How to write data to a compound data using h5py?

查看:107
本文介绍了如何使用h5py将数据写入复合数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道在 c 中,我们可以轻松地使用 struct 类型构造复合数据集,并逐块分配数据.我目前正在使用 h5py Python 中实现类似的结构.

I know that in c we can construct a compound dataset easily using struct type and assign data chunk by chunk. I am currently implementing a similar structure in Python with h5py.

import h5py
import numpy as np 

# we create a h5 file 
f = h5py.File("test.h5") # default is mode "a"


# We define a compound datatype using np.dtype
dt_type = np.dtype({"names":["image","feature"],
                   "formats":[('<f4',(4,4)),('<f4',(10,))]})

# we define our dataset with 5 instances
a = f.create_dataset("test", shape=(5,), dtype=dt_type)

要写入数据,我们可以这样做...

To write data, we can do this...

# "feature" array is 1D
a['feature']

输出为

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)

# Write 1s to data field "feature"
a["feature"] = np.ones((5,10))

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], dtype=float32)

问题是当我将2D数组图像"写入文件时.

The problem is when I wrote 2D array "image" into file.

a["image"] = np.ones((5,4,4))

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

我阅读了文档并进行了研究.不幸的是,我没有找到一个好的解决方案.我知道我们可以应用 group/dataset 来模仿这种复合数据,但是我真的很想保留这种结构.有什么好方法吗?

I read the documentation and did research. Unfortunately, I did not find a good solution. I understand that we apply group/dataset to mimic this compound data but I really want to keep this structure. Is there a good way to do this?

任何帮助将不胜感激.谢谢.

Any help would be appreciated. Thank you.

推荐答案

您可以使用PyTables(又名表格)用所需的数组填充HDF5文件.您应该将每一行视为一个独立的条目(由dtype定义).因此,图像"阵列存储为5个(4x4)ndarray,而不是单个(5x4x4)ndarray.功能"数组也是如此.

You can use PyTables (aka tables) to populate your HDF5 file with the desired arrays. You should think of each row as an independent entry (defined by a dtype). So, the 'image' array is stored as 5 (4x4) ndarrays, not a single (5x4x4) ndarray. The same goes for the 'feature' array.

此示例一次将每个功能"和图像"数组添加一行.或者,您可以创建一个numpy记录数组,其中两个数组都包含多行数据,然后使用Table.append()函数添加.

This example adds each 'feature' and 'image' array one row at a time. Alternately, you can create a numpy record array with both arrays with data for multiple rows, then add with a Table.append() function.

请参见下面的代码创建文件,然后打开只读以检查数据.

See code below to create the file, then open read only to check the data.

import tables as tb
import numpy as np 

# open h5 file for writing
with tb.File('test1_tb.h5','w') as h5f:

# define a compound datatype using np.dtype
    dt_type = np.dtype({"names":["feature","image"],
                        "formats":[('<f4',(10,)) , ('<f4',(4,4)) ] })

# create empty table (dataset)
    a = h5f.create_table('/', "test1", description=dt_type)

# create dataset row interator
    a_row = a.row
# create array data and append to dataset
    for i in range(5):
        a_row['feature'] = i*np.ones(10)
        a_row['image'] = np.random.random(4*4).reshape(4,4)
        a_row.append()

    a.flush()

# open h5 file read only and print contents
with tb.File('test1_tb.h5','r') as h5fr:
    a = h5fr.get_node('/','test1')
    print (a.coldtypes)
    print ('# of rows:',a.nrows)

    for row in a:
        print (row['feature'])
        print (row['image'])

这篇关于如何使用h5py将数据写入复合数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆