如何在hdf5文件中创建可变长度的列? [英] How to create variable length columns in hdf5 file?

查看：136 发布时间：2020/11/22 1:38:35 python-3.x hdf5 h5py

本文介绍了如何在hdf5文件中创建可变长度的列?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用h5py包为我的训练集创建HDF5文件.

我想创建具有可变长度的第一列.例如，[1,2,3]作为列中的第一个条目，[1,2,3,4,5]作为列中的第二个条目，依此类推，将同一数据集中的其他5列保留在数据类型为int的HDF5文件中，且长度固定，即1. /p>

我尝试了以下代码语句来解决这种情况:

dt = h5py.special_dtype(vlen=np.dtype('int32'))
datatype = np.dtype([('FieldA', dt), ('FieldB', dt1), ('FieldC', dt1), ('FieldD', dt1), ('FieldE', dt1), ('FieldF', dt1)])

但是，在输出中，对于该数据集，上述每个列仅得到一个空数组.

而且，当我尝试以下代码时:

dt = h5py.special_dtype(vlen=np.dtype('int32'))
data = db.create_dataset("data1", (5000,), dtype=dt)

这仅给我一列具有可变长度条目的数据集，但我希望所有这6列都包含在同一数据集中，但第1列具有如上所述的可变长度条目.

对于如何为这种情况找到解决方案，我完全感到困惑.任何帮助将不胜感激.

解决方案

您要使用可变长度(参差不齐)的列，还是只需要一个可以容纳数据数组(不超过dtype限制)的列?第二个很简单.请参见下面的代码. (这是一个简单的示例，其中包含2个字段来演示该方法.)

my_dt = np.dtype([('FieldA', 'int32', (4,)), ('FieldB', 'int32') ] )


with h5py.File('SO_57260167.h5','w') as h5f :

    data = h5f.create_dataset("testdata", (10,), dtype=my_dt)

    for cnt in range(10) :
        arr = np.random.randint(1,1000,size=4)
        print (arr)
        data[cnt,'FieldA']=arr
        data[cnt,'FieldB']=arr[0]
        print (data[cnt]['FieldB'])

如果您要使用可变长度(参差不齐")的列，那么我有99％的把握确定在数据集中使用特殊dtype时，您只能使用一列.另外，我认为您无法命名字段/列. (我无法使它正常工作，也找不到任何示例.)
下面的代码显示了上面的示例，将其修改为将变量列数据放入数据集vl_data中，并将其余的整数数据放入数据集fx_data中.

vl_dt = h5py.special_dtype(vlen=np.dtype('int32'))
my_dt = np.dtype([('FieldB', 'int32'), ('FieldC', 'int32'), ('FieldD', 'int32'), 
                  ('FieldE', 'int32'), ('FieldF', 'int32')])

with h5py.File('SO_57260167_vl.h5','w') as h5f :

    vl_data = h5f.create_dataset("testdata_vl", (10,), dtype= vl_dt)
    fx_data = h5f.create_dataset("testdata", (10,), dtype=my_dt )

    for cnt in range(10) :
        arr = np.random.randint(1,1000,size=cnt+2)
#        print (arr)
        vl_data[cnt]=arr
        print (vl_data[cnt])
        fx_data[cnt,'FieldB']=arr[0]
        fx_data[cnt,'FieldF']=arr[-1]
        print (fx_data[cnt])

I am using h5py package to create HDF5 file for my training set.

I want to create the first column having a variable length. For example, [1,2,3] as 1st entry in the column, [1,2,3,4,5] as 2nd entry and so on leaving other 5 columns in the same dataset in HDF5 file with data type int with a fixed length, i.e. 1.

I have tried the below code statement to solve this type of scenario:

dt = h5py.special_dtype(vlen=np.dtype('int32'))
datatype = np.dtype([('FieldA', dt), ('FieldB', dt1), ('FieldC', dt1), ('FieldD', dt1), ('FieldE', dt1), ('FieldF', dt1)])

But, in the output, I got only empty array for each of the columns stated above for this dataset.

And, when I tried the below code:

dt = h5py.special_dtype(vlen=np.dtype('int32'))
data = db.create_dataset("data1", (5000,), dtype=dt)

This only gives me one column with variable length entries in the dataset but I want all these 6 columns to be included in the same dataset but with 1st column as having variable length entries like stated above.

I am totally confused as to how to get a solution for this type of scenario. Any help would highly be appreciated.

解决方案

Do you want variable length (ragged) columns, or just a column that can hold an array of data (up to the dtype limit)? The second is pretty straight forward. See the code below. (It's a simple example with 2 fields to demonstrate the method.)

my_dt = np.dtype([('FieldA', 'int32', (4,)), ('FieldB', 'int32') ] )


with h5py.File('SO_57260167.h5','w') as h5f :

    data = h5f.create_dataset("testdata", (10,), dtype=my_dt)

    for cnt in range(10) :
        arr = np.random.randint(1,1000,size=4)
        print (arr)
        data[cnt,'FieldA']=arr
        data[cnt,'FieldB']=arr[0]
        print (data[cnt]['FieldB'])

If you want a variable length ("ragged") column, I'm 99% sure you are limited to a single column when using the special dtype in a dataset. Also, I don't think you can name the fields/columns. (I couldn't get it to work, and couldn't find any examples.)
Code below shows example above modified to put variable column data in data set vl_data and the rest of the integer data in data set fx_data.

vl_dt = h5py.special_dtype(vlen=np.dtype('int32'))
my_dt = np.dtype([('FieldB', 'int32'), ('FieldC', 'int32'), ('FieldD', 'int32'), 
                  ('FieldE', 'int32'), ('FieldF', 'int32')])

with h5py.File('SO_57260167_vl.h5','w') as h5f :

    vl_data = h5f.create_dataset("testdata_vl", (10,), dtype= vl_dt)
    fx_data = h5f.create_dataset("testdata", (10,), dtype=my_dt )

    for cnt in range(10) :
        arr = np.random.randint(1,1000,size=cnt+2)
#        print (arr)
        vl_data[cnt]=arr
        print (vl_data[cnt])
        fx_data[cnt,'FieldB']=arr[0]
        fx_data[cnt,'FieldF']=arr[-1]
        print (fx_data[cnt])

这篇关于如何在hdf5文件中创建可变长度的列?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在hdf5文件中创建可变长度的列? [英] How to create variable length columns in hdf5 file?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在hdf5文件中创建可变长度的列? [英] How to create variable length columns in hdf5 file?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭