sklearn多类svm函数 [英] sklearn multiclass svm function

查看:54
本文介绍了sklearn多类svm函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个类别的标签,想计算模型的准确性.
我对需要使用哪个sklearn函数感到困惑.据我了解,以下代码仅用于二进制分类.

I have multi class labels and want to compute the accuracy of my model.
I am kind of confused on which sklearn function I need to use. As far as I understood the below code is only used for the binary classification.

# dividing X, y into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y,  test_size=0.25,random_state = 0)

# training a linear SVM classifier
from sklearn.svm import SVC
svm_model_linear = SVC(kernel = 'linear', C = 1).fit(X_train, y_train)
svm_predictions = svm_model_linear.predict(X_test)

# model accuracy for X_test  
accuracy = svm_model_linear.score(X_test, y_test)
print accuracy

,据我从链接中了解到:使用时,sklearn.svm.SVC的哪个Decision_function_shapeOneVsRestClassifier?

and as I understood from the link: Which decision_function_shape for sklearn.svm.SVC when using OneVsRestClassifier?

对于多类分类,我应该使用带有Decision_function_shape的 OneVsRestClassifier (使用 ovr ovo 并检查哪个效果更好)

for multiclass classification I should use OneVsRestClassifier with decision_function_shape (with ovr or ovo and check which one works better)

svm_model_linear = OneVsRestClassifier(SVC(kernel = 'linear',C = 1, decision_function_shape = 'ovr')).fit(X_train, y_train)

主要问题是,预测标签的时间对我来说确实很重要,但是运行分类器和预测数据大约需要1分钟(这也被添加到特征缩减中,例如PCA,这也需要花费一些时间)?有什么建议可以减少svm multiclassifer的时间吗?

The main problem is that the time of predicting the labels does matter to me but it takes about 1 minute to run the classifier and predict the data (also this time is added to the feature reduction such as PCA which also takes sometime)? any suggestions to reduce the time for svm multiclassifer?

推荐答案

这里有很多要考虑的地方:

There are multiple things to consider here:

1)您会看到, OneVsRestClassifier 将分离出所有标签,并在给定数据上训练多个svm对象(每个标签一个).因此,每次仅将二进制数据提供给单个svm对象.

1) You see, OneVsRestClassifier will separate out all labels and train multiple svm objects (one for each label) on the given data. So each time, only binary data will be supplied to single svm object.

2)SVC在内部使用 libsvm liblinear ,它们具有用于多类或多标签输出的"OvO"策略.但是由于第1点,这一点将毫无用处. libsvm 将仅获取二进制数据.

2) SVC internally uses libsvm and liblinear, which have a 'OvO' strategy for multi-class or multi-label output. But this point will be of no use because of point 1. libsvm will only get binary data.

即使这样做,也不会考虑'decision_function_shape'.因此,是否提供 decision_function_shape ='ovr' decision_function_shape ='ovr'都没有关系.

Even if it did, it doesnt take into account the 'decision_function_shape'. So it does not matter if you provide decision_function_shape = 'ovr' or decision_function_shape = 'ovr'.

因此,看来您看错了问题. decision_function_shape 不应影响速度.在拟合之前,请尝试对数据进行标准化.支持向量机可以很好地处理标准化数据.

So it seems that you are looking at the problem wrong. decision_function_shape should not affect the speed. Try standardizing your data before fitting. SVMs work well with standardized data.

这篇关于sklearn多类svm函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆