如何为图像分类准备训练数据 [英] How to prepare training data for image classification

查看:94
本文介绍了如何为图像分类准备训练数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是机器学习的新手,并且在图像分类方面存在一些问题.使用一种简单的分类器技术K最近的邻居,我试图区分猫和狗.

I'm new to Machine Learning and have some problems with image classification. Using a simple classifier technique K Nearest Neighbours I'm trying to distinguish Cats from Dogs.

到目前为止,我的代码:

My code so far:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

DATADIR = "/Users/me/Desktop/ds2/ML_image_classification/kagglecatsanddogs_3367a/PetImages"
CATEGORIES = ['Dog', 'Cat']

IMG_SIZE = 30
data = []
categories = []

for category in CATEGORIES:
    path = os.path.join(DATADIR, category) 
    categ_id = CATEGORIES.index(category)
    for img in os.listdir(path):
        try:
            img_array = cv2.imread(os.path.join(path,img), 0)
            new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
            data.append(new_array)
            categories.append(categ_id)
        except Exception as e:
            # print(e)
            pass

print(data[0])


s1 = pd.Series(data)
s2 = pd.Series(categories)
frame = {'Img array': s1, 'category': s2}
df = pd.DataFrame(frame) 


from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

knn = KNeighborsClassifier()
knn.fit(X_train, y_train)

在这里我尝试拟合数据时出现错误:

And here I get an error when trying to fit the data:

   ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-76-9d98d7b11202> in <module>
      2 from sklearn.neighbors import KNeighborsClassifier
      3 
----> 4 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
      5 
      6 print(X_train)

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
   2094         raise TypeError("Invalid parameters passed: %s" % str(options))
   2095 
-> 2096     arrays = indexable(*arrays)
   2097 
   2098     n_samples = _num_samples(arrays[0])

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in indexable(*iterables)
    228         else:
    229             result.append(np.array(X))
--> 230     check_consistent_length(*result)
    231     return result
    232 

~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    203     if len(uniques) > 1:
    204         raise ValueError("Found input variables with inconsistent numbers of"
--> 205                          " samples: %r" % [int(l) for l in lengths])
    206 
    207 

ValueError: Found input variables with inconsistent numbers of samples: [24946, 22451400]

如何正确准备训练数据? 顺便提一句.我不想使用深度学习.这将是我的下一步.

How to prepare the training the data properly? Btw. I don't want to use deep learning. This will be the next step for me.

在此感谢您的帮助.

推荐答案

如果您不使用深度学习进行图像分类,则必须准备适合监督学习分类的数据.

If you don`t use deep learning for image classification,you have to prepare your data that fit to the supervised learning classification.

步骤

1)将所有图像调整为相同大小.您可以在每个图像上循环并调整大小并保存.

1) Resize all images to same size.You can loop over each image and resize and save.

2)获取每个图像的像素矢量并创建数据集.例如,如果您的猫图像位于"Cat"文件夹中,而狗图像位于"Dog"文件夹中,则迭代该文件夹中的所有图像并获得像素值.同一时间将数据标记为"cat"(cat = 1)和"non-cat"(non-cat = 0)

2) get the pixel vector of each image and create the dataset.As a example if your cat images are in "Cat" folder and Dog images are in "Dog" folder,iterate over all images inside the folder and get the pixel values.same time label the data as "cat"(cat=1) and "non-cat"(non-cat=0)

import os
import  imageio
import pandas as pd

catimages = os.listdir("Cat")
dogimages = os.listdir("Dog")
catVec = []
dogVec = []
for img in catimages:
       img = imageio.imread(f"Cat/{img}")
       ar = img.flatten()
       catVec.append(ar)    
catdf = pd.DataFrame(catVec)    
catdf.insert(loc=0,column ="label",value=1)

for img in dogimages:
       img = imageio.imread(f"Dog/{img}")
       ar = img.flatten()
       dogVec.append(ar)    
dogdf = pd.DataFrame(dogVec)    
dogdf.insert(loc=0,column ="label",value=0)

3)concat catdf和dogdf并重新整理数据框

3) concat catdf and dogdf and shuffle the dataframe

data = pd.concat([catdf,dogdf])      
data = data.sample(frac=1)

现在您的图像中有数据集.

now you have dataset with lable for your images.

4)拆分数据集以进行训练,测试和拟合模型.

4) split dataset to train and test and fit to the model.

这篇关于如何为图像分类准备训练数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆