使用H5PY存储多维可变长度数组 [英] Storing multidimensional variable length array with h5py

查看:230
本文介绍了使用H5PY存储多维可变长度数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过以下过程在HDF文件中存储可变长度数组的列表:

I'm trying to store a list of variable length arrays in an HDF file with the following procedure:

phn_mfccs = []

# Import wav files
for waveform in files:
    phn_mfcc = mfcc(waveform) # produces a variable length multidim array of the shape (x, 13, 1)              

    # Add MFCC and label to dataset
    # phn_mfccs has dimension (len(files),)
    # phn_mfccs[i] has variable dimension ([# of frames in ith segment] (variable), 13, 1)
    phn_mfccs.append(phn_mfcc) 

dt = h5py.special_dtype(vlen=np.dtype('float64'))
mfccs_out.create_dataset('phn_mfccs', data=phn_mfccs, dtype=dt)

虽然我的数据类型似乎无法解决-而不是mfccs_out数据集中包含多维数组的每个元素,它仅包含一维数组.例如如果我最初附加的第一个phn_mfcc的尺寸为(59,13,1),则mfccs_out['phn_mfccs'][0]的尺寸为(59,). 我怀疑这是因为我只使用float64数据类型,而数组还需要其他东西吗?但是,如果我没有指定数据集或尝试使用dtype='O',它会弹出一个错误,例如对象dtype'O'没有与本机HDF等效的内容".

It seems like my datatypes aren't working out though -- instead of each element of the mfccs_out dataset containing a multidimensional array, it contains just a 1D array. e.g. if the first phn_mfcc I append originally has dimension (59,13,1), mfccs_out['phn_mfccs'][0] has dimension (59,). I suspect it is because I'm just using a float64 datatype, and I need something else for an array of arrays? If I don't specify the dataset or try to use dtype='O', though, it spits out an error like "Object dtype 'O' has no native HDF equivalent."

理想情况下,我想要的是mfccs_out['phn_mfccs'][i]包含我附加到列表phn_mfccs的第i个phn_mfcc.

Ideally, what I'd like is for mfccs_out['phn_mfccs'][i] to contain the ith phn_mfcc that I appended to the list phn_mfccs.

推荐答案

您的代码的本质是:

phn_mfccs = []
<loop several layers>
    phn_mfcc = <some sort of array expanded by one dimension>
    phn_mfccs.append(phn_mfcc) 

循环结束时,phn_mfccs是一个数组列表.我无法从代码中分辨出dtype和形状是什么.或者列表的每个元素是否有所不同.

At the end of loops phn_mfccs is a list of arrays. I can't tell from the code what the dtype and shape is. Or whether it differs for each element of the list.

我不确定在给出数组列表时create_dataset会做什么.可以将其包装在np.array中.

I'm not entirely sure what create_dataset does when given a list of arrays. It may wrap it in np.array.

mfccs_out.create_dataset('phn_mfccs', data=phn_mfccs, dtype=dt)

np.array(phn_mfccs)产生什么?形状,dtype?如果所有元素都是相同形状和dtype的数组,则将产生更高维的数组.如果它们的形状不同,它将生成一个对象为dtype的一维数组.鉴于错误消息,我怀疑是后者.

What does np.array(phn_mfccs) produce? Shape, dtype? If all the elements are arrays of the same shape and dtype it will produce a higher dimensional array. If they differ in shape, it will produce a 1d array with object dtype. Given the error message, I suspect the latter.

我已经回答了几个vlen问题,但是并没有解决很多问题

I've answered a few vlen questions but haven't worked with it a lot

http://docs.h5py.org/en/latest/special.html

我隐约记得h5数组的参差不齐"维只能是1d.因此,包含一个尺寸变化的一维浮点数组的phn_mfccs对象数组可能会起作用.

I vaguely recall that the 'ragged' dimension of a h5 array can only be 1d. So a phn_mfccs object array that contains 1d float arrays of varying dimensions might work.

我可能想出一个简单的例子.我建议您构造一个更简单的问题,以便我们复制粘贴和实验.我们不需要知道您如何从目录中读取数据.我们只需要了解您要编写的数组(列表)的内容即可.

I might come up with a simple example. And I suggest you construct a simpler problem that we can copy-n-paste and experiement with. We don't need to know how you read the data from your directory. We just need to understand the content of the array (list) that you are trying to write.

2015年关于vlen阵列的帖子

A 2015 post on vlen arrays

将vlen与h5py一起使用时的莫名其妙行为

H5PY-如何存储多个2D数组不同的尺寸

In [24]: f = h5py.File('vlen.h5','w')
In [25]: dt = h5py.special_dtype(vlen=np.dtype('float64'))
In [26]: dataset = f.create_dataset('vlen',(4,), dtype=dt)
In [27]: dataset.value
Out[27]: 
array([array([], dtype=float64), array([], dtype=float64),
       array([], dtype=float64), array([], dtype=float64)], dtype=object)
In [28]: for i in range(4):
    ...:     dataset[i]=np.arange(i+3)

In [29]: dataset.value
Out[29]: 
array([array([ 0.,  1.,  2.]), array([ 0.,  1.,  2.,  3.]),
       array([ 0.,  1.,  2.,  3.,  4.]),
       array([ 0.,  1.,  2.,  3.,  4.,  5.])], dtype=object)

如果我尝试将2d数组写入dataset,则会收到错误消息

If I try to write 2d arrays to dataset I get an error

OSError: Can't prepare for writing data (Src and dest data spaces have different sizes)

dataset本身可能是多维的,但vlen对象必须是一维浮点数组.

The dataset itself may be multidimensional, but the vlen object has to be a 1d array of floats.

这篇关于使用H5PY存储多维可变长度数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆