Scikit学习:从文件夹加载图像以创建用于KNN分类的标记数据集 [英] Scikit-learn: Loading images from folder to create a labelled dataset for KNN classification

查看:159
本文介绍了Scikit学习:从文件夹加载图像以创建用于KNN分类的标记数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用scikit-learn使用K最近邻分类来进行手写数字识别。我有一个文件夹,其中包含5001个手写数字图像(0-9中的每个数字500个图像)。





我试图找到一种基于这些图像创建数据集的方法,以便随后可以创建训练和测试集。我已经阅读了很多有关如何使用scikit-learn进行K最近邻分类的在线教程,但是大多数教程都加载了现有数据集,例如MNIST手写数字数据集。



是否可以通过从文件夹中读取图像然后为每个图像分配标签来创建自己的数据集?我不确定我可以使用什么方法来做到这一点。

解决方案

要读取数据,您应该执行以下操作:

  from os import listdir 
from os.path import isfile,join
import re
import matplotlib.pyplot as plt

mypath ='。'#编辑数据路径
文件= [f为listdir(mypath)中的f,如果isfile(join(mypath,f))]

x = []
y = []

用于文件中的文件:
label = file.split('_')[0]#假定您的img像这样命名您想获得标签 eight_1.png eight
y.append(label)
img = plt.imread(file)
x.append(img)

然后,您需要先对x和y进行一些操作,然后才能将其用于scikit学习,但您应该没事。 / p>

I want to do handwritten digit recognition using K-Nearest Neighbours classification with scikit-learn. I have a folder that has 5001 images of handwritten digits (500 images for each digit from 0-9).

I am trying to find a way to create a dataset based on these images, so that I can then create a training and testing set. I have read a lot of online tutorials about how to do K-Nearest Neighbours classification using scikit-learn but most of the tutorials load existing datasets such as the MNIST dataset of handwritten digits.

Is there any way to create your own dataset by reading images from a folder and then assigning a label to each image? I am not sure what methods I can use to do this. Any insights are appreciated.

解决方案

To read the data you should do something like this :

from os import listdir
from os.path import isfile, join
import re
import matplotlib.pyplot as plt

mypath = '.' # edit with the path to your data
files = [f for f in listdir(mypath) if isfile(join(mypath, f))]

x = []
y = []

for file in files:
    label = file.split('_')[0] # assuming your img is named like this "eight_1.png" you want to get the label "eight"
    y.append(label)
    img = plt.imread(file)
    x.append(img)

Then you will need to manipulate a little bit x and y before give it to scikit learn but you should be fine.

这篇关于Scikit学习:从文件夹加载图像以创建用于KNN分类的标记数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆