开发Dilbert卡通图像分类算法的一般方法 [英] General approach to developing an image classification algorithm for Dilbert cartoons

查看:132
本文介绍了开发Dilbert卡通图像分类算法的一般方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为一项自我发展的练习,我想开发一种简单的分类算法,在给定Dilbert卡通的特定单元格的情况下,它能够识别卡通中存在哪些字符(Dilbert,PHB,Ratbert等)。

As a self-development exercise, I want to develop a simple classification algorithm that, given a particular cell of a Dilbert cartoon, is able to identify which characters are present in the cartoon (Dilbert, PHB, Ratbert etc.).

我认为最好的方法是(1)对图像应用某种算法,将其转换为一组功能,并且(2)使用训练集和许多可能的机器学习算法之一,以将某些功能的存在与否与单元格中存在的特定字符相关联。

I assume the best way to do this is to (1) apply some algorithm to the image, which converts it into a set of features, and (2) use a training set and one of many possible machine learning algorithms to correlate the presence/absence of certain features with a particular character being present in the cell.

所以我的问题是-( a)这是正确的方法吗?(b)由于要测试许多分类算法和ML算法,找到合适方法的最佳方法是什么,以及(c)假设我们要开始使用哪种算法?基本上是对动画片进行分类。

So my questions are - (a) is this the correct approach, (b) since there's a number of classification algorithms and ML algorithms to test, what is a good methodology for finding the right one, and (c) which algorithms would you start with, given that we're essentially conducting a classification exercise on a cartoon.

推荐答案

所以我认为您在正确的位置上步骤1(对图像应用某种算法,将其转换为一组特征)

So i think you are on the right track w/r/t your step 1 (apply some algorithm to the image, which converts it into a set of features).

这个项目比大多数机器学习问题更具挑战性,因为在这里,您实际上必须从原始数据(构成卡通的各个帧)创建训练数据集。例如,抓取一个框架,在该框架中标识两个字符,Dilbert和带角的字符(我相信Dilbert的老板,不知道他的名字),从该框架中提取这两个字符,并附加到每个适当的类标签上(例如,Dlibert为 1)。



第1步

This project is more challenging that most ML problems because here you will actually have to create your training data set from the raw data (the individual frames comprising the cartoons). For instance, grab a frame, identify two characters in that frame, Dilbert and the character with horns (Dilbert's boss i believe, don't know his name), extract those two characters from that frame and append to each the appropriate class label (e.g., "1" for Dlibert).

Step 1

要提取构成Dilbert卡通的每个帧中的单个字符,我建议每个帧的 光谱分解 。如果您不熟悉这项技术,那么它的本质就是本征向量分解。

To extract the individual characters from each of the frames comprising the a Dilbert cartoon, i would suggest a spectral decomposition of each frame. If you are not familiar with this technique, at its core, it's just an eigenvector decomp.

如果您喜欢python(或R,因为您可以使用python-to-R绑定,例如 RPy ),那么我强烈建议您查看 sklearn 。特别是,这个出色的库(最初是在 SciPy scikits 项目框架下开发的,因此使用NumPy + SciPy进行矩阵计算)具有多种图像分割算法,其中一种基于光谱聚类。对于项目的这一步,您很可能希望查看这两个scikits.learn模块

If you like python (or R, given that you can use python-to-R bindings like RPy) then i would strongly encourage you to look at sklearn. In particular, this excellent library (which was originally developed under the SciPy scikits project umbrella, and therefore uses NumPy + SciPy for matrix computation) has several algorithms for image segmentation, one of which is based on spectral clustering. For this step in your Project, you would most likely want to look at these two scikits.learn modules


  • sklearn .feature_extraction (尤其是 image 子模块)

sklearn.cluster.spectral_clustering

这两个模块附带了两个很好的示例脚本,其中一个分割数码照片其他分割一个图像,该图像由三个部分重叠的圆圈组成,彼此之间的对比度极低,而彼此之间的对比度极低。 ,我怀疑您需要执行分解的难度更大的问题。换句话说, sklearn 在源代码发行版中包含两个完整的,文档齐全的示例脚本,这两个脚本都处理与您相似的数据。这一步中的一个或两个都是很好的模板。


Included with these two modules are two good example scripts, one segmenting a digital photograph and the other segmenting an image comprised of three partially super-imposed circles with minimal contrast w/r/t each other and w/r/t the background--both, i suspect are more difficult problems that the decompositions you will need to perform. In other words, sklearn has two complete, well-documented example scripts included in the source distribution, both of which process data similar to yours. Either or both would be an excellent template for this step.

第2步

Step 2

这是第一步;这是第二个:所有分解后的图像成分分组,每个迪尔伯特字符分组 。接下来,为每个组分配一个类别标签,例如,如果分解步骤中有四个字符,则类别标签的合适选择是 0, 1, 2和 3。将这些类标签附加到组成矩阵(步骤1中的分解产物),以便将每个字符矩阵映射到其相应的类(Dilbert字符)。


So that's the first step; here's the second: sort all of the components of the decomposed images into groups, one group for each Dilbert character. Next, assign a class label to each Group, e.g., if there are four characters from your decomposition step, then a decent choice for class labels is "0", "1", "2", and "3." Append those class labels to the component matrices (the decomposition products from step 1) so each character matrix is mapped to its corresponding class (Dilbert character).

第3步

Step 3

选择合适的机器学习技术。您在此步骤中有很多选择。唯一的标准是该技术属于受监管的类别(因为您已为数据分配了类别标签),并且该技术可用作分类器(即,它返回一个类标签,以及输出数值的回归变量)。鉴于这是一个私人项目,我会选择一个您觉得最有趣的项目。满足我刚刚提到的条件的一些对象是:多层感知器(神经网络),支持向量机(SVM)和 k最近邻居(kNN)。


Select a suitable ML technique. You have many choices for this step; the only criteria are that the technique is in the supervised category (because you have assigned class labels to your data) and that it function as a classifier (i.e., it returns a class label, versus a regressor which outputs a numerical value). Given this is a personal project, i would chose the one that seems most interesting to you. A few that satisfy the criteria i just mentioned are: multi-layer perceptron (neural network), support vector machine (SVM), and k-nearest neighbors (kNN).

第4步

Step 4

训练,验证和测试分类器

train, validate, and test your classifier

替代技术 模板匹配

完成步骤1(每个图像分解为一组对象,其中一些对象无疑会代表字符),您可以手动筛选这些分解产物,并为卡通中的每个角色收集示例。这些是 模板

Once Step 1 is completed (each image is decomposed into a set of objects, some of which will no doubt represent the characters) you can manually sift through these decomposition products and collect exemplars for each character in the cartoon. The are the templates.

接下来,您将从图像中分割出的对象与这组唯一模板进行比较。在 scikit图像,另一种scipy scikit,您可以使用方法 match_template ,将模板图像和候选图像传递给该方法,该方法将返回一个显示像素的2D数组像素相关性(介于-1和1之间)。

Next you compare objects segmented from an image with this set of unique templates. In scikit-image, another scipy scikit, you can use the method match_template, to which you pass in a template image and a candidate image, and this method returns a 2D array showing the pixel-by-pixel correlation (between -1 and 1).

这篇关于开发Dilbert卡通图像分类算法的一般方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆