如何使用Python对图像数据集进行过采样? [英] How to oversample image dataset using Python?
问题描述
我正在处理一个不平衡的图像数据集(不同类)的多类分类问题.我尝试了 imblearn
库,但是它不适用于图像数据集.
I am working on a multiclass classification problem with an unbalanced dataset of images(different class). I tried imblearn
library, but it is not working on the image dataset.
我有一个图像数据集,它属于3类,即A,B,C.A有1000个数据,B有300个数据,C有100个数据.我想对B和C类进行过采样,以便避免数据不平衡.请让我知道如何使用python对图像数据集进行过度采样.
I have a dataset of images belonging to 3 class namely A,B,C. A has 1000 data, B has 300 and C has 100. I want to oversample class B and C, so that I can avoid data imbalance. Please let me know how to oversample the image dataset using python.
推荐答案
感谢您的澄清.通常,您不要使用Python进行过采样.而是,您对数据库进行了预处理,从而复制了简化的类.在您引用的情况下,您可以复制B类中的所有内容,并制作C类中的所有内容的5个副本.这将使您的新平衡达到1000:600:500,这可能更适合您的训练常规.现在,您可以改组2100,而不是原始的1400张图像.
Thanks for the clarification. In general, you don't oversample with Python. Rather, you pre-process your data base, duplicating the short-handed classes. In the case you cite, you might duplicate everything in class B, and make 5 copies of everything in class C. This gives you a new balance of 1000:600:500, likely more palatable to your training routines. Instead of the original 1400 images, you now shuffle 2100.
能解决您的问题吗?
这篇关于如何使用Python对图像数据集进行过采样?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!