LinearSVC 和 SVC(内核=“线性")有什么区别? [英] What is the difference between LinearSVC and SVC(kernel="linear")?
问题描述
我找到了 sklearn.svm.LinearSVC代码>
和 sklearn.svm.SVC(kernel='linear')
,它们看起来与我非常相似,但我在路透社上得到的结果却截然不同.
I found sklearn.svm.LinearSVC
and sklearn.svm.SVC(kernel='linear')
and they seem very similar to me, but I get very different results on Reuters.
sklearn.svm.LinearSVC: 81.05% in 28.87s train / 9.71s test
sklearn.svm.SVC : 33.55% in 6536.53s train / 2418.62s test
两者都有一个线性内核.LinearSVC的容差比SVC高:
Both have a linear kernel. The tolerance of the LinearSVC is higher than the one of SVC:
LinearSVC(C=1.0, tol=0.0001, max_iter=1000, penalty='l2', loss='squared_hinge', dual=True, multi_class='ovr', fit_intercept=True, intercept_scaling=1)
SVC (C=1.0, tol=0.001, max_iter=-1, shrinking=True, probability=False, cache_size=200, decision_function_shape=None)
这两个函数在其他方面有何不同?即使我设置了 kernel='linear
, tol=0.0001
, max_iter=1000并且
decision_function_shape='ovr'SVC
比
LinearSVC`花费的时间要长得多.为什么?
How do both functions differ otherwise? Even if I set kernel='linear
, tol=0.0001
, max_iter=1000 and
decision_function_shape='ovr'the
SVCtakes much longer than
LinearSVC`. Why?
我使用 sklearn 0.18
并且两者都包含在 OneVsRestClassifier
中.我不确定这是否与 multi_class='ovr'
/decision_function_shape='ovr'
相同.
I use sklearn 0.18
and both are wrapped in the OneVsRestClassifier
. I'm not sure if this makes the same as multi_class='ovr'
/ decision_function_shape='ovr'
.
推荐答案
诚然,LinearSVC
和 SVC(kernel='linear')
产生不同的结果,即.e.指标得分和决策边界,因为它们使用不同的方法.下面的玩具示例证明了这一点:
Truly, LinearSVC
and SVC(kernel='linear')
yield different results, i. e. metrics score and decision boundaries, because they use different approaches. The toy example below proves it:
from sklearn.datasets import load_iris
from sklearn.svm import LinearSVC, SVC
X, y = load_iris(return_X_y=True)
clf_1 = LinearSVC().fit(X, y) # possible to state loss='hinge'
clf_2 = SVC(kernel='linear').fit(X, y)
score_1 = clf_1.score(X, y)
score_2 = clf_2.score(X, y)
print('LinearSVC score %s' % score_1)
print('SVC score %s' % score_2)
--------------------------
>>> 0.96666666666666667
>>> 0.98666666666666669
<小时>
这种差异的关键原则如下:
The key principles of that difference are the following:
- 默认缩放,
LinearSVC
最小化平方铰链损失,而SVC
最小化常规铰链损失.可以为LinearSVC
中的loss
参数手动定义铰链"字符串. LinearSVC
使用 One-vs-All(也称为 One-vs-Rest) 多类减少,而SVC
使用 一对一 多类减少.在此处也注明.此外,对于多类分类问题,SVC
适合N * (N - 1)/2
模型,其中N
是类的数量.相比之下,LinearSVC
只适合N
模型.如果分类问题是二元的,那么两种场景都只适合一个模型.multi_class
和decision_function_shape
参数没有共同之处.第二个是聚合器,它将决策函数的结果转换为(n_features, n_samples)
的方便形状.multi_class
是一种建立解决方案的算法方法.LinearSVC
的底层估算器是 liblinear,它实际上会惩罚截距.SVC
使用 libsvm 估计器,但不使用.liblinear 估计器针对线性(特殊)情况进行了优化,因此比 libsvm 在大量数据上收敛速度更快.这就是LinearSVC
解决问题所需的时间更少的原因.
- By default scaling,
LinearSVC
minimizes the squared hinge loss whileSVC
minimizes the regular hinge loss. It is possible to manually define a 'hinge' string forloss
parameter inLinearSVC
. LinearSVC
uses the One-vs-All (also known as One-vs-Rest) multiclass reduction whileSVC
uses the One-vs-One multiclass reduction. It is also noted here. Also, for multi-class classification problemSVC
fitsN * (N - 1) / 2
models whereN
is the amount of classes.LinearSVC
, by contrast, simply fitsN
models. If the classification problem is binary, then only one model is fit in both scenarios.multi_class
anddecision_function_shape
parameters have nothing in common. The second one is an aggregator that transforms the results of the decision function in a convenient shape of(n_features, n_samples)
.multi_class
is an algorithmic approach to establish a solution.- The underlying estimators for
LinearSVC
are liblinear, that do in fact penalize the intercept.SVC
uses libsvm estimators that do not. liblinear estimators are optimized for a linear (special) case and thus converge faster on big amounts of data than libsvm. That is whyLinearSVC
takes less time to solve the problem.
事实上,LinearSVC
在截距缩放后实际上并不是线性的,正如评论部分所述.
In fact, LinearSVC
is not actually linear after the intercept scaling as it was stated in the comments section.
这篇关于LinearSVC 和 SVC(内核=“线性")有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!