如何将大HDF5文件拆分为多个小HDF5数据集 [英] How to split a big HDF5 file into multiple small HDF5 dataset

查看:183
本文介绍了如何将大HDF5文件拆分为多个小HDF5数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的HDF5文件,其中包含图像及其对应的地面真相密度图.我想将它们放入网络CRSNet,它需要将图像放在单独的文件中.我该如何实现?非常感谢.

I have a big HDF5 file with the images and its corresponding ground truth density map. I want to put them into the network CRSNet and it requires the images in separate files. How can I achieve that? Thank you very much.

-基本信息我有一个带有两个键"images"的HDF5文件.和"density_maps".它们的形状是(300,380,676,1).300代表图像数量,380和676分别代表高度和宽度.

-- Basic info I have a HDF5 file with two keys "images" and "density_maps". Their shapes are (300, 380, 676, 1). 300 stands for the number of images, 380 and 676 refer to the height and width respectively.

-我需要放入CRSNet网络中的是带有相应HDF5文件的图像(jpg).它们的形状为(572,945).

-- What I need to put into the CRSNet network are the images (jpg) with their corresponding HDF5 files. The shape of them would be (572, 945).

非常感谢您的任何评论和讨论!

Thanks a lot for any comment and discussion!

推荐答案

对于初学者,请先对h5py和HDF5进行快速说明.h5py是读取HDF5文件的Python软件包.您还可以使用PyTables包(以及其他语言:C,C ++,FORTRAN)读取HDF5文件.

For starters, a quick clarification on h5py and HDF5. h5py is a Python package to read HDF5 files. You can also read HDF5 files with the PyTables package (and with other languages: C, C++, FORTRAN).

我不太确定"具有相应h5py(HDF5)文件的图像(jpg)"是什么意思.据我了解,您的所有数据都保存在1个HDF5文件中.另外,我不明白您的意思是:"它们的形状为(572,945).".这与图像数据不同,对吗?请更新您的帖子以澄清这些内容.

I'm not entirely sure what you mean by "the images (jpg) with their corresponding h5py (HDF5) files" As I understand all of your data is in 1 HDF5 file. Also, I don't understand what you mean by: "The shape of them would be (572, 945)." This is different from the image data, right? Please update your post to clarify these items.

从数据集中提取数据相对容易.这就是您获取图片"的方式.作为NumPy数组,并使用cv2编写为单独的jpg文件.参见下面的代码:

It's relatively easy to extract data from a dataset. This is how you can get the "images" as NumPy arrays and and use cv2 to write as individual jpg files. See code below:

with h5py.File('yourfile.h5','r') as h5f:
    for i in range(h5f['images'].shape[0]):
        image_arr = h5f['images'][i,:]   # slice notation gets [i,:,:,:]
        cv2.imwrite(f'test_img_{i:03}.jpg',img_arr)

在开始编码之前,确定要将图像作为单个图像文件或单个图像数据(通常为NumPy数组)吗?我问,因为大多数CNN流程的第一步是读取图像并将其转换为数组以进行下游处理.您已经在HDF5文件中包含了阵列.您可能需要做的就是读取每个阵列并将其保存到适当的数据结构中,以供CRSNet处理它们.例如,下面是创建数组列表的代码(由TensorFlow和Keras使用):

Before you start coding, are you sure you need the images as individual image files, or individual image data (usually NumPy arrays)? I ask because the first step in most CNN processes is reading the images and converting them to arrays for downstream processing. You already have the arrays in the HDF5 file. All you may need to do is read each array and save to the appropriate data structure for CRSNet to process them. For example, here is the code to create a list of arrays (used by TensorFlow and Keras):

image_list = []
with h5py.File('yourfile.h5','r') as h5f:
    for i in range(h5f['images'].shape[0]):
        image_list.append( h5f['images'][i,:] )  # gets slice [i,:,:,:]
        

这篇关于如何将大HDF5文件拆分为多个小HDF5数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆