使用不同大小的 h5py 数组进行保存 [英] Saving with h5py arrays of different sizes

查看:41
本文介绍了使用不同大小的 h5py 数组进行保存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 HDF5 数据格式存储大约 3000 个 numpy 数组.数组长度从 5306 到 121999 np.float64

I am trying to store about 3000 numpy arrays using HDF5 data format. Arrays vary in length from 5306 to 121999 np.float64

我得到Object dtype dtype('O') 没有原生的 HDF5 等价物错误,因为数据的不规则性质 numpy 使用通用对象类.

I am getting Object dtype dtype('O') has no native HDF5 equivalent error since due to the irregular nature of the data numpy uses the general object class.

我的想法是将所有数组填充到 121999 的长度并将大小存储在另一个数据集中.

My idea was to pad all the arrays to 121999 length and storing the sizes in another dataset.

但是这在空间上看起来效率很低,有没有更好的方法?

However this seems quite inefficient in space, is there a better way?

澄清一下,我想存储 3126 个 dtype = np.float64 数组.我将它们存储在 list 中,当 h5py 执行该例程时,它会转换为 dtype = object 数组,因为它们的长度不同.举例说明:

To clarify, I want to store 3126 arrays of dtype = np.float64. I have them stored in a listand when h5py does the routine it converts to an array of dtype = object because they are different lengths. To illustrate it:

a = np.array([0.1,0.2,0.3],dtype=np.float64)
b = np.array([0.1,0.2,0.3,0.4,0.5],dtype=np.float64)
c = np.array([0.1,0.2],dtype=np.float64)

arrs = np.array([a,b,c]) # This is performed inside the h5py call
print(arrs.dtype)
>>> object
print(arrs[0].dtype)
>>> float64

推荐答案

看起来您尝试过以下操作:

Looks like you tried something like:

In [364]: f=h5py.File('test.hdf5','w')    
In [365]: grp=f.create_group('alist')

In [366]: grp.create_dataset('alist',data=[a,b,c])
...
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

但是如果您将数组保存为单独的数据集,它会起作用:

But if instead you save the arrays as separate datasets it works:

In [367]: adict=dict(a=a,b=b,c=c)

In [368]: for k,v in adict.items():
    grp.create_dataset(k,data=v)
   .....:     

In [369]: grp
Out[369]: <HDF5 group "/alist" (3 members)>

In [370]: grp['a'][:]
Out[370]: array([ 0.1,  0.2,  0.3])

并访问组中的所有数据集:

and to access all the datasets in the group:

In [389]: [i[:] for i in grp.values()]
Out[389]: 
[array([ 0.1,  0.2,  0.3]),
 array([ 0.1,  0.2,  0.3,  0.4,  0.5]),
 array([ 0.1,  0.2])]

这篇关于使用不同大小的 h5py 数组进行保存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆