将包含jpeg图像的文件夹转换为hdf5 [英] Convert a folder comprising jpeg images to hdf5

查看:157
本文介绍了将包含jpeg图像的文件夹转换为hdf5的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法在Python中将包含.jpeg图像的文件夹转换为hdf5?我正在尝试建立用于图像分类的神经网络模型.谢谢!

Is there a way to convert a folder comprising .jpeg images to hdf5 in Python? I am trying to build a neural network model for classification of images. Thanks!

推荐答案

有很多方法可以处理和保存图像数据.这是读取1个文件夹中的所有图像文件并将其加载到HDF5文件中的方法的2种变体.此过程的概述:

There are a lot of ways to process and save image data. Here are 2 variations of a method that reads all of the image files in 1 folder and loads into a HDF5 file. Outline of this process:

  1. 计算图像数量(用于调整数据集大小).
  2. 创建HDF5文件(前缀: 1ds _ )
  3. 创建具有适当形状和类型(整数)的空数据集
  4. 使用 glob.iglob()遍历图像.然后做:
    • 使用 cv2.imread()
    • 读取
    • 使用 cv2.resize()
    • 调整大小
    • 复制到数据集 img_ds [cnt:cnt + 1:,:,:]
  1. Count the number of images (used to size the dataset).
  2. Create HDF5 file (prefixed: 1ds_)
  3. Create empty dataset with appropriate shape and type (integers)
  4. Use glob.iglob() to loop over images. Then do:
    • Read with cv2.imread()
    • Resize with cv2.resize()
    • Copy to the dataset img_ds[cnt:cnt+1:,:,:]

这是一种方法.要考虑的其他事项:

This is ONE way to do it. Additional things to consider:

  1. 我将所有图像加载到1个数据集中.如果您有不同尺寸的图像,则必须调整图像的大小.如果不想调整大小,则需要将每个图像保存在不同的数据集中(相同的过程,但是在循环内创建一个新的数据集).请参阅第二个 with/as:并循环将数据保存到第二个HDF5(前缀: nds _ )
  2. 我没有尝试捕获图像名称.您可以使用1个数据集上的属性,也可以使用多个数据集作为数据集名称.
  3. 我的图像是 .ppm 文件,因此您需要将glob函数修改为使用 *.jpg .
  1. I loaded all images in 1 dataset. If you you have different size images, you must resize the images. If you don't want to resize, you need to save each image in a different dataset (same process, but create a new dataset inside the loop). See the second with/as: and loop that saves the data to the 2nd HDF5 (prefixed: nds_)
  2. I didn't try to capture image names. You could do that with attributes on 1 dataset, or as the dataset name with multiple datasets.
  3. My images are .ppm files, so you need to modify the glob functions to use *.jpg.

下面的简单版本(2021年3月16日添加):
假设所有文件都在当前文件夹中,并将所有调整大小后的图像加载到一个数据集中(称为图像").请参见前面的代码,获取第二种方法,该方法无需调整大小即可将每个图像加载到单独的数据集中.

Simpler Version Below (added Mar 16 2021):
Assumes all files are in the current folder, AND loads all resized images to one dataset (named 'images'). See previous code for the second method that loads each image in separate dataset without resizing.

import sys
import glob
import h5py
import cv2

IMG_WIDTH = 30
IMG_HEIGHT = 30

h5file = 'import_images.h5'

nfiles = len(glob.glob('./*.ppm'))
print(f'count of image files nfiles={nfiles}')

# resize all images and load into a single dataset
with h5py.File(h5file,'w') as  h5f:
    img_ds = h5f.create_dataset('images',shape=(nfiles, IMG_WIDTH, IMG_HEIGHT,3), dtype=int)
    for cnt, ifile in enumerate(glob.iglob('./*.ppm')) :
        img = cv2.imread(ifile, cv2.IMREAD_COLOR)
        # or use cv2.IMREAD_GRAYSCALE, cv2.IMREAD_UNCHANGED
        img_resize = cv2.resize( img, (IMG_WIDTH, IMG_HEIGHT) )
        img_ds[cnt:cnt+1:,:,:] = img_resize

下面的先前代码(从2021年3月15日开始):

import sys
import glob
import h5py
import cv2

IMG_WIDTH = 30
IMG_HEIGHT = 30

# Check command-line arguments
if len(sys.argv) != 3:
    sys.exit("Usage: python load_images_to_hdf5.py data_directory model.h5")

print ('data_dir =', sys.argv[1])
data_dir = sys.argv[1]
print ('Save model to:', sys.argv[2])
h5file = sys.argv[2]

nfiles = len(glob.glob(data_dir + '/*.ppm'))
print(f'Reading dir: {data_dir}; nfiles={nfiles}')

# resize all images and load into a single dataset
with h5py.File('1ds_'+h5file,'w') as  h5f:
    img_ds = h5f.create_dataset('images',shape=(nfiles, IMG_WIDTH, IMG_HEIGHT,3), dtype=int)
    for cnt, ifile in enumerate(glob.iglob(data_dir + '/*.ppm')) :
        img = cv2.imread(ifile, cv2.IMREAD_COLOR)
        # or use cv2.IMREAD_GRAYSCALE, cv2.IMREAD_UNCHANGED
        img_resize = cv2.resize( img, (IMG_WIDTH, IMG_HEIGHT) )
        img_ds[cnt:cnt+1:,:,:] = img_resize

# load each image into a separate dataset (image NOT resized)    
with h5py.File('nds_'+h5file,'w') as  h5f:
    for cnt, ifile in enumerate(glob.iglob(data_dir + '/*.ppm')) :
        img = cv2.imread(ifile, cv2.IMREAD_COLOR)
        # or use cv2.IMREAD_GRAYSCALE, cv2.IMREAD_UNCHANGED
        img_ds = h5f.create_dataset('images_'+f'{cnt+1:03}', data=img)

这篇关于将包含jpeg图像的文件夹转换为hdf5的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆