keras图像预处理不平衡数据 [英] keras image preprocessing unbalanced data

查看:326
本文介绍了keras图像预处理不平衡数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

全部

我正在尝试使用Keras对两个类进行图像分类.对于一个班级,我的图像数量非常有限,例如500张.对于另一堂课,我的图像数量几乎是无限的.因此,如果我想使用keras图像预处理,该怎么做?理想情况下,我需要这样的东西.对于第一类,我提供了500张图像,并使用ImageDataGenerator获得了更多图像.对于第二类,每次我从1000000个图像数据集中依次提取500张图像,并且可能不需要数据扩充.在查看示例时在这里以及 Keras文档中,我发现默认情况下,培训文件夹中每个课程的图像数量相等.所以我的问题是,是否存在用于执行此技巧的现有API?如果是这样,请向我指出.如果没有,是否有解决此需求的方法?

I'm trying to use Keras to do image classification on two classes. For one class, I have very limited number of images, say 500. As for the other class, I have almost infinite number of images. So if I want to use keras image preprocessing, how to do that? Ideally, I need something like this. For class one, I feed 500 images and use ImageDataGenerator to get more images. For class two, each time I extract 500 images in sequence from 1000000 image dataset and probably no data augmentation needed. While looking at the example here and also Keras documentation, I found the training folder contains equal number of images for each class by default. So my question is that is there existing APIs for doing this trick? If so, please kindly point it out to me. If not, is there any workaround to this needs?

推荐答案

您有一些选择.

选项1

使用fit()函数的class_weight参数,该参数是将类映射到权重值的字典.假设您输入class_weight = {0:3 , 1:1}的数量为500个0类样本和1500个1类样本.这使0类的权重是1类的三倍.

Use the class_weight parameter of the fit() function which is a dictionary mapping classes to a weight value. Lets say you have 500 samples of class 0 and 1500 samples of class 1 than you feed in class_weight = {0:3 , 1:1}. That gives class 0 three times the weight of class 1.

train_generator.classes为您的加权提供正确的类名称.

train_generator.classes gives you the proper class names for your weighting.

如果您要以编程方式进行计算,则可以使用scikit-learn的sklearn.utils.compute_class_weight():

If you want to calculate this programmatically than you could use scikit-learn´s sklearn.utils.compute_class_weight(): https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/class_weight.py

该函数查看标签的分布并产生权重,以平均惩罚训练集中不足或过多的班级.

The function looks at the distribution of labels and produces weights to equally penalize under or over-represented classes in the training set.

另请参见此有用的线程: https://github.com/fchollet/keras/issues /1875

See also this useful thread here: https://github.com/fchollet/keras/issues/1875

该线程可能也有帮助:

This thread might also be of help: Is it possible to automatically infer the class_weight from flow_from_directory in Keras?

选项2

您使用带有生成器的虚拟训练运行,在其中应用图像增强(例如旋转,缩放,裁切,翻转等),并保存增强的图像以供以后进行实际训练.这样,您可以为代表性不足的班级创建更大甚至平衡的数据集.

You use a dummy training run with a generator where you apply your image augmentation like rotation, scaling, cropping, flipping etc. and save the augmented images for the real training later. By that you can create a bigger or even balanced dataset for your underrepresented class.

在此虚拟运行中,将flow_from_directory函数中的save_to_dir设置为您选择的文件夹,然后仅从需要更多示例的类中获取图像.显然,您放弃了任何训练结果,因为您仅使用此跑步来获取更多数据.

In this dummy run you set save_to_dir in the flow_from_directory function to a folder of your choosing and later on only take the images from the class that you need more samples of. You obviously discard any training results since you only use this run to get more data.

这篇关于keras图像预处理不平衡数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆