预测sklearn中的训练数据 [英] Predict training data in sklearn

查看:102
本文介绍了预测sklearn中的训练数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我像这样使用scikit-learn的SVM:

I use scikit-learn's SVM like so:

clf = svm.SVC()
clf.fit(td_X, td_y) 

我的问题是,当我使用分类器预测训练集成员的班级时,即使在scikit-learns实现中,分类器也可能是错误的. (例如clf.predict(td_X[a])==td_Y[a])

My question is when I use the classifier to predict the class of a member of the training set, could the classifier ever be wrong even in scikit-learns implementation. (eg. clf.predict(td_X[a])==td_Y[a])

推荐答案

是的,请运行以下代码,例如:

Yes definitely, run this code for example:

from sklearn import svm
import numpy as np
clf = svm.SVC()
np.random.seed(seed=42)
x=np.random.normal(loc=0.0, scale=1.0, size=[100,2])
y=np.random.randint(2,size=100)
clf.fit(x,y)
print(clf.score(x,y))

分数是0.61,因此将近40%的训练数据被错误分类.部分原因是,即使默认内核是'rbf'(理论上也应该能够对任何训练数据集进行完美分类,只要您没有两个带有不同标签的相同训练点),正则化以减少过度拟合.默认的正则化器为C=1.0.

The score is 0.61, so nearly 40% of the training data is missclassified. Part of the reason is that even though the default kernel is 'rbf' (which in theory should be able to classify perfectly any training data set, as long as you don't have two identical training points with different labels), there is also regularization to reduce overfitting. The default regularizer is C=1.0.

如果运行与上述相同的代码,但将clf = svm.SVC()切换为clf = svm.SVC(C=200000),则精度为0.94.

If you run the same code as above but switch clf = svm.SVC() to clf = svm.SVC(C=200000), you'll get an accuracy of 0.94.

这篇关于预测sklearn中的训练数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆