使用scikit-learn,如何在小型数据集上学习SVM? [英] Using scikit-learn, how do I learn a SVM over a small data set?

查看:140
本文介绍了使用scikit-learn,如何在小型数据集上学习SVM?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

借助scikit-learn,我构建了一个支持向量机,用于解决基本的手写数字检测问题.

With scikit-learn, I have built a support vector machine, for a basic handwritten digit detection problem.

我的总数据集包含235个观察值.我的观察各包含1025个特征.我知道,使用支持向量机的优点之一是在这种情况下,即适度数量的观测值具有大量特征.

My total data set consists of 235 observations. My observations consist of 1025 features each. I know that one of the advantages of using a support vector machine is in situations like this, where there are a modest number of observations that have a large number of features.

创建SVM之后,请查看混淆矩阵(如下)...

After my SVM is created, I look at my confusion matrix (below)...

Confusion Matrix:
[[ 6  0]
 [ 0 30]]

...并意识到仅保留15%的数据用于测试(即36个观察值)是不够的.

...and realize that holding out 15% of my data for testing (i.e., 36 observations) is not enough.

我的问题是这样的:如何使用交叉验证解决这个小数据问题?

My problem is this: How can I work around this small data issue, using cross validation?

推荐答案

这正是交叉验证(及其通用化,例如Err ^ 0.632)的目的.保留集仅在具有大量数据的情况下才是合理的.

This is exactly what cross validation (and its generalizations, like Err^0.632) is for. Hold-out set is reasonable only with huge quantities of data.

这篇关于使用scikit-learn,如何在小型数据集上学习SVM?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆