将数据集从1个HDF5文件提取到多个文件 [英] Extracting datasets from 1 HDF5 file to multiple files
问题描述
在从HDF5生成img时,我实际上提出了一个问题.现在,我遇到的另一个问题是从现有的位置生成h5.
I have actually raised a question in generating img from HDF5. Now, another problem I have is to generate the h5 from the existing.
例如,我有一个[ABC.h5],里面有图像及其gt_density贴图的数据集.关键是[images,density_maps]
For instance, I have a [ABC.h5], inside, there is the dataset for image and its gt_density map. The keys would be [images, density_maps]
我想要[GT_001.h5],[GT_002.h5] ...,而不是单个h5文件.这是为每张图像提取的[density_maps].
I want to have [GT_001.h5], [GT_002.h5]... instead of the single h5 file. This is the [density_maps] extracted for each image.
如何实现这一目标?非常感谢.
How to achieve this? Thanks a lot.
这里是更多相关信息.谢谢@ kcw78的指导.在CRSNet的原始数据集中,有一个图像文件及其在h5中的地面真实密度图.该密度图是< HDF5数据集<密度> ;:形状(544、932),类型为< f4>>.<类'h5py._hl.dataset.Dataset'>.因此,在此数据集中,对于每个IMG_001.jpg,都有一个对应的IMG_001.h5.
Here is more related information. Thank you @kcw78 for the guides. In the original dataset in the CRSNet, there is a single image file and its ground truth density map in h5. This density map is <HDF5 dataset "density": shape (544, 932), type "<f4"> <class 'h5py._hl.dataset.Dataset'>. Therefore, in this dataset, for each IMG_001.jpg, there is an according to IMG_001.h5.
在我拥有的数据集中,这是一个包含信息的h5文件:HDF5数据集"density_maps":形状(300、380、676、1),类型为"f4".< class'h5py._hl.dataset.Dataset'>< HDF5数据集图像":形状(300、380、676、1),类型为"| u1".< class'h5py._hl.dataset.Dataset'>
In the dataset I have, it is a single h5 file with the information: HDF5 dataset "density_maps": shape (300, 380, 676, 1), type "<f4"> <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "images": shape (300, 380, 676, 1), type "|u1"> <class 'h5py._hl.dataset.Dataset'>
我已经成功地从文件中生成了相应的图像.因此,我当前的问题是如何循环并将数据集复制到另一个新的h5并为每个图像构建一个对应的密度图h5.为了举例说明,我如何从单个H5PY文件中获取IMG_001.h5 ...
I have successfully generated the corresponding images from the file. Therefore, my current problem would be how to loop and copy the dataset to another new h5 and built a corresponding density map h5 for each image. To explain with a sample, how can I achieve the IMG_001.h5... from this single H5PY file
推荐答案
这根据我对数据的解释回答了您的问题.如果仍不能解决您的问题,请在下面阐明摘要.
This answers your question based on my interpretation of your data. If it doesn't solve your problem, please clarify the summary below.
首先,请谨慎使用术语数据集".对于h5py,它具有特定含义.您使用数据集"指用于训练和测试CNN的一组数据.当还存在 IN HDF5文件的数据集时,这将很困难.
First, please be careful with the term "dataset". It has a specific meaning with h5py. You use "dataset" to refer to a set of data used for training and testing a CNN. That makes it difficult when there are also datasets IN a HDF5 file.
根据您的解释,这是我对要进行培训和测试的不同文件的理解.
Based on your explanation, this is my understanding of the different files you have for training and testing.
您在CRSNet中原始的培训和测试数据集:
图像文件:IMG _ ###.jpg
地面真实密度地图文件:IMG _ ###.h5,具有以下属性:name ="density";形状=(544,932);type =< f4">
您有成对的图像和密度文件-通过IMG_NNN为IMG_001提供了1个.jpg和.h5文件.
Your original set of training and testing data in the CRSNet:
image files: IMG_###.jpg
ground truth density map files: IMG_###.h5 with attributes: name="density"; shape=(544, 932); type="<f4">
You have pairs of image and density files -- 1 .jpg and .h5 file for IMG_001 thru IMG_NNN.
您的一组新的培训和测试数据:
H5文件名:[ABC.h5]
H5数据集1 :名称=图像":shape =(300、380、676、1),类型="|| u1"
H5数据集2 :名称="density_maps",形状=(300、380、676、1),类型="
Your new set of training and testing data:
H5 Filename: [ABC.h5]
H5 Dataset 1: name="images": shape=(300, 380, 676, 1), type="|u1"
H5 Dataset 2: name="density_maps", shape=(300, 380, 676, 1), type="<f4">
您已从图片"中提取了数据,.h5文件中的数据集来创建IMG _ ###.jpg(例如您最初的训练和测试数据集).现在,您要从"density_maps"提取数组..h5文件中的数据集创建IMG _ ###.h5.
You have extracted the data from the "images" dataset in this .h5 file to create IMG_###.jpg (like your original set of training and testing data). Now you want to extract arrays from the "density_maps" dataset in the .h5 file to create IMG_###.h5.
如果是,则该过程与图像提取过程相同.唯一的区别是您将数据写入.h5文件而不是.jpg文件.参见下面的伪代码.
If so, the process is the same as the image extraction procedure. The only difference is you write the data to a .h5 file instead of .jpg file. See below for a pseudo-code.
with h5py.File('yourfile.h5','r') as h5r:
for i in range(h5r['density_maps'].shape[0]):
dmap_arr = h5r['density_maps'][i,:]
h5w=h5py.File(f'IMG_{i:03}.h5','w')
h5w.create_dataset('density_maps',data=dmap_arr)
h5w.close()
注意,当您阅读 dmap_arr
时,您可能会得到 shape =(380,676,1)
.如果是这样,您可以使用 .reshape(380,676)
进行重塑.像这样:
Note, when you read dmap_arr
you may get shape=(380, 676, 1)
. If so, you can reshape with .reshape(380, 676)
. Like this:
dmap_arr = h5r['density_maps'][i,:].reshape(380, 676)
这篇关于将数据集从1个HDF5文件提取到多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!