当数据位于每个类的不同目录中时,如何使用ImageDataGenerator将数据拆分为3折(训练,验证,测试) [英] how to Split data in 3 folds (train,validation,test) using ImageDataGenerator when data is in different directories of each class

查看:82
本文介绍了当数据位于每个类的不同目录中时,如何使用ImageDataGenerator将数据拆分为3折(训练,验证,测试)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用Keras的ImageDataGenerator将数据分成3份? ImageDataGenerator仅提供validation_split参数,因此,如果我使用它,则不会为以后的目的设置测试.

How do I split my data into 3 folds using ImageDataGenerator of Keras? ImageDataGenerator only gives validation_split argument so if I use it, I wont be having my test set for later purpose.

我的数据的格式为

>input_data_dir
 >class_1_dir
  > image_1.png
  > image_2.png
 > class_2_dir
 > class_3_dir

推荐答案

正如您正确提到的那样,使用Keras ImageDataGenerator不可能在一行代码中将数据分为3个折叠.

As you rightly mentioned, splitting the Data into 3 Folds is not possible in one line of code using Keras ImageDataGenerator.

解决方法是将与Test Data相对应的图像存储在单独的文件夹中并应用ImageDataGenerator,如下所示:

Work around would be to store the Images corresponding to Test Data in a separate Folder and apply ImageDataGenerator, as shown below:

# Path to Training Directory
train_dir = 'Dogs_Vs_Cats_Small/train'

# Path to Test Directory
test_dir = 'Dogs_Vs_Cats_Small/test'

Train_Gen = ImageDataGenerator(1./255)
Test_Gen = ImageDataGenerator(1./255)


Train_Generator = Train_Gen.flow_from_directory(train_dir, target_size = (150,150), batch_size = 20, class_mode = 'binary')

Test_Generator = Test_Gen.flow_from_directory(test_dir, target_size = (150, 150), class_mode = 'binary', batch_size = 20)

示例代码以从原始目录中提取一些图像并将它们放置在两个单独的文件夹traintest中,这可能会对您有所帮助:

Sample Code to extract some images from the Original Directory and place them in two separate folders, train and test, which may help you, is shown below:

import os, shutil

# Path to the directory where the original dataset was uncompressed
original_dataset_dir = 'Dogs_Vs_Cats'

# Directory where you’ll store your smaller dataset
base_dir = 'Dogs_Vs_Cats_Small2'

os.mkdir(base_dir)

# Directory for the training splits
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)

# Directory for the test splits
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)

# Directory with training cat pictures
train_cats_dir = os.path.join(train_dir, 'cats')
os.mkdir(train_cats_dir)

# Directory with training dog pictures
train_dogs_dir = os.path.join(train_dir, 'dogs')
os.mkdir(train_dogs_dir)

# Directory with Test Cat Pictures
test_cats_dir = os.path.join(test_dir, 'cats')
os.mkdir(test_cats_dir)

# Directory with Test Dog Pictures
test_dogs_dir = os.path.join(test_dir, 'dogs')
os.mkdir(test_dogs_dir)

# Copies the first 1,000 cat images to train_cats_dir. 
fnames = ['cat.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, 'train', fname)
    dst = os.path.join(train_cats_dir, fname)
    shutil.copyfile(src, dst)

# Copies the next 500 cat images to test_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, 'train', fname)
    dst = os.path.join(test_cats_dir, fname)
    shutil.copyfile(src, dst)

# Copies the first 1,000 dog images to train_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, 'train', fname)
    dst = os.path.join(train_dogs_dir, fname)
    shutil.copyfile(src, dst)

# Copies the next 500 dog images to test_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1500, 2000)]
for fname in fnames:
    src = os.path.join(original_dataset_dir, 'train', fname)
    dst = os.path.join(test_dogs_dir, fname)
    shutil.copyfile(src, dst)

# Sanity Check to ensure that Training, Validation and Test Folders have the expected number of images

print('Number of Cat Images in Training Directory is {}'.format(len(os.listdir(train_cats_dir))))
print('Number of Dog Images in Training Directory is {}'.format(len(os.listdir(train_dogs_dir))))
print('Number of Cat Images in Testing Directory is {}'.format(len(os.listdir(test_cats_dir))))
print('Number of Dog Images in Testing Directory is {}'.format(len(os.listdir(test_dogs_dir))))

希望这会有所帮助.

这篇关于当数据位于每个类的不同目录中时,如何使用ImageDataGenerator将数据拆分为3折(训练,验证,测试)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆