ImageDataGenerator 是否会向我的数据集添加更多图像? [英] Does ImageDataGenerator add more images to my dataset?

查看:34
本文介绍了ImageDataGenerator 是否会向我的数据集添加更多图像?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Inception V3 模型进行图像分类.Keras 的 ImageDataGenerator 是否会创建添加到我的数据集的新图像?如果我有 1000 张图像,使用此功能是否会将其翻倍为 2000 张用于训练的图像?有没有办法知道创建了多少图像并输入到​​模型中?

I'm trying to do image classification with the Inception V3 model. Does ImageDataGenerator from Keras create new images which are added onto my dataset? If I have 1000 images, will using this function double it to 2000 images which are used for training? Is there a way to know how many images were created and now fed into the model?

推荐答案

Short answer: 1) 所有原始图像都只是经过变换(即旋转、缩放等)每个 epoch 然后用于训练,以及 2) [因此] 每个 epoch 中的图像数量等于您拥有的原始图像数量.

Short answer: 1) All the original images are just transformed (i.e. rotation, zooming, etc.) every epoch and then used for training, and 2) [Therefore] the number of images in each epoch is equal to the number of original images you have.

长答案:在每个时代,ImageDataGenerator 对您拥有的图像应用转换,并使用转换后的图像进行训练.转换集包括旋转、缩放等.通过这样做,您以某种方式创建了新数据(即也称为数据增强),但显然生成的图像与原始图像并没有完全不同.通过这种方式,学习到的模型可能会更加稳健和准确,因为它是在同一图像的不同变体上进行训练的.

Long answer: In each epoch, the ImageDataGenerator applies a transformation on the images you have and use the transformed images for training. The set of transformations includes rotation, zooming, etc. By doing this you're somehow creating new data (i.e. also called data augmentation), but obviously the generated images are not totally different from the original ones. This way the learned model may be more robust and accurate as it is trained on different variations of the same image.

您需要将fit方法的steps_per_epoch参数设置为n_samples/batch_size,其中n_samples是您拥有的训练数据总数(即在您的情况下为 1000).这样,在每个 epoch 中,每个训练样本只增加一次,因此每个 epoch 将生成 1000 张转换图像.

You need to set the steps_per_epoch argument of fit method to n_samples / batch_size, where n_samples is the total number of training data you have (i.e. 1000 in your case). This way in each epoch, each training sample is augmented only one time and therefore 1000 transformed images will be generated in each epoch.

另外,我认为有必要澄清一下增强"的含义;在这种情况下:基本上,当我们使用 ImageDataGenerator 并启用其增强功能时,我们正在增强图像.但是增强"这个词这并不意味着,如果我们有 100 张原始训练图像,我们在增强后每个 epoch 最终会有 1000 张图像(即每个 epoch 的训练图像数量不会增加).相反,这意味着我们在每个 epoch 中对每个图像使用不同的变换;因此,如果我们训练我们的模型,比如说,5 个 epoch,我们在训练中使用了每个原始图像的 5 个不同版本(或者在整个训练中使用 100 * 5 = 500 个不同的图像,而不是只使用 100 个原始图像)整个培训).换句话说,在整个训练过程中,唯一图像的总数从开始到结束都在增加,而不是每个 epoch.

Further, I think it's worth clarifying the meaning of "augmentation" in this context: basically we are augmenting the images when we use ImageDataGenerator and enabling its augmentation capabilities. But the word "augmentation" here does not mean, say, if we have 100 original training images we end up having 1000 images per epoch after augmentation (i.e. the number of training images does not increase per epoch). Instead, it means we use a different transformation of each image in each epoch; hence, if we train our model for, say, 5 epochs, we have used 5 different versions of each original image in training (or 100 * 5 = 500 different images in the whole training, instead of using just the 100 original images in the whole training). To put it differently, the total number of unique images increases in the whole training from start to finish, and not per epoch.

这篇关于ImageDataGenerator 是否会向我的数据集添加更多图像?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆