将数据集从1个HDF5文件提取到多个文件 [英] Extracting datasets from 1 HDF5 file to multiple files

查看:118
本文介绍了将数据集从1个HDF5文件提取到多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在从HDF5生成img时,我实际上提出了一个问题.现在,我遇到的另一个问题是从现有的位置生成h5.

I have actually raised a question in generating img from HDF5. Now, another problem I have is to generate the h5 from the existing.

例如,我有一个[ABC.h5],里面有图像及其gt_density贴图的数据集.关键是[images,density_maps]

For instance, I have a [ABC.h5], inside, there is the dataset for image and its gt_density map. The keys would be [images, density_maps]

我想要[GT_001.h5],[GT_002.h5] ...,而不是单个h5文件.这是为每张图像提取的[density_maps].

I want to have [GT_001.h5], [GT_002.h5]... instead of the single h5 file. This is the [density_maps] extracted for each image.

如何实现这一目标?非常感谢.

How to achieve this? Thanks a lot.

这里是更多相关信息.谢谢@ kcw78的指导.在CRSNet的原始数据集中,有一个图像文件及其在h5中的地面真实密度图.该密度图是< HDF5数据集<密度> ;:形状(544、932),类型为< f4>>.<类'h5py._hl.dataset.Dataset'>.因此,在此数据集中,对于每个IMG_001.jpg,都有一个对应的IMG_001.h5.

Here is more related information. Thank you @kcw78 for the guides. In the original dataset in the CRSNet, there is a single image file and its ground truth density map in h5. This density map is <HDF5 dataset "density": shape (544, 932), type "<f4"> <class 'h5py._hl.dataset.Dataset'>. Therefore, in this dataset, for each IMG_001.jpg, there is an according to IMG_001.h5.

在我拥有的数据集中,这是一个包含信息的h5文件:HDF5数据集"density_maps":形状(300、380、676、1),类型为"f4".< class'h5py._hl.dataset.Dataset'>< HDF5数据集图像":形状(300、380、676、1),类型为"| u1".< class'h5py._hl.dataset.Dataset'>

In the dataset I have, it is a single h5 file with the information: HDF5 dataset "density_maps": shape (300, 380, 676, 1), type "<f4"> <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "images": shape (300, 380, 676, 1), type "|u1"> <class 'h5py._hl.dataset.Dataset'>

我已经成功地从文件中生成了相应的图像.因此,我当前的问题是如何循环并将数据集复制到另一个新的h5并为每个图像构建一个对应的密度图h5.为了举例说明,我如何从单个H5PY文件中获取IMG_001.h5 ...

I have successfully generated the corresponding images from the file. Therefore, my current problem would be how to loop and copy the dataset to another new h5 and built a corresponding density map h5 for each image. To explain with a sample, how can I achieve the IMG_001.h5... from this single H5PY file

推荐答案

这根据我对数据的解释回答了您的问题.如果仍不能解决您的问题,请在下面阐明摘要.

This answers your question based on my interpretation of your data. If it doesn't solve your problem, please clarify the summary below.

首先,请谨慎使用术语数据集".对于h5py,它具有特定含义.您使用数据集"指用于训练和测试CNN的一组数据.当还存在 IN HDF5文件的数据集时,这将很困难.

First, please be careful with the term "dataset". It has a specific meaning with h5py. You use "dataset" to refer to a set of data used for training and testing a CNN. That makes it difficult when there are also datasets IN a HDF5 file.

根据您的解释,这是我对要进行培训和测试的不同文件的理解.

Based on your explanation, this is my understanding of the different files you have for training and testing.

您在CRSNet中原始的培训和测试数据集:
图像文件:IMG _ ###.jpg
地面真实密度地图文件:IMG _ ###.h5,具有以下属性:name ="density";形状=(544,932);type =< f4">
您有成对的图像和密度文件-通过IMG_NNN为IMG_001提供了1个.jpg和.h5文件.

Your original set of training and testing data in the CRSNet:
image files: IMG_###.jpg
ground truth density map files: IMG_###.h5 with attributes: name="density"; shape=(544, 932); type="<f4">
You have pairs of image and density files -- 1 .jpg and .h5 file for IMG_001 thru IMG_NNN.

您的一组新的培训和测试数据:
H5文件名:[ABC.h5]
H5数据集1 :名称=图像":shape =(300、380、676、1),类型="|| u1"
H5数据集2 :名称="density_maps",形状=(300、380、676、1),类型="

Your new set of training and testing data:
H5 Filename: [ABC.h5]
H5 Dataset 1: name="images": shape=(300, 380, 676, 1), type="|u1"
H5 Dataset 2: name="density_maps", shape=(300, 380, 676, 1), type="<f4">

您已从图片"中提取了数据,.h5文件中的数据集来创建IMG _ ###.jpg(例如您最初的训练和测试数据集).现在,您要从"density_maps"提取数组..h5文件中的数据集创建IMG _ ###.h5.

You have extracted the data from the "images" dataset in this .h5 file to create IMG_###.jpg (like your original set of training and testing data). Now you want to extract arrays from the "density_maps" dataset in the .h5 file to create IMG_###.h5.

如果是,则该过程与图像提取过程相同.唯一的区别是您将数据写入.h5文件而不是.jpg文件.参见下面的伪代码.

If so, the process is the same as the image extraction procedure. The only difference is you write the data to a .h5 file instead of .jpg file. See below for a pseudo-code.

with h5py.File('yourfile.h5','r') as h5r:
    for i in range(h5r['density_maps'].shape[0]):
        dmap_arr = h5r['density_maps'][i,:] 
        h5w=h5py.File(f'IMG_{i:03}.h5','w')
        h5w.create_dataset('density_maps',data=dmap_arr)
        h5w.close()
        

注意,当您阅读 dmap_arr 时,您可能会得到 shape =(380,676,1).如果是这样,您可以使用 .reshape(380,676)进行重塑.像这样:

Note, when you read dmap_arr you may get shape=(380, 676, 1). If so, you can reshape with .reshape(380, 676). Like this:

        dmap_arr = h5r['density_maps'][i,:].reshape(380, 676)

这篇关于将数据集从1个HDF5文件提取到多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆