scikit-learn:SVC 和 SGD 有什么区别? [英] scikit-learn: what is the difference between SVC and SGD?

查看:109
本文介绍了scikit-learn:SVC 和 SGD 有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SVM:http://scikit-learn.org/stable/modules/svm.html#classification

SGD:http://scikit-learn.org/stable/modules/sgd.html#classification

在我看来,似乎也差不多,因为他们写道SGD 实现了一个线性模型".有人能解释一下它们之间的区别吗?

seem to do pretty much the same to my eyes,as they write "an SGD implements a linear model". Can someone explain the differences between them?

推荐答案

SVM 是一个 支持向量机是一种特殊的线性模型.从理论上讲,这是一个凸优化问题,我们可以在多项式时间内获得全局最优.有许多不同的优化方法.

SVM is a support-vector machine which is a special linear-model. From a theoretical view it's a convex-optimization problem and we can get the global-optimum in polynomial-time. There are many different optimization-approaches.

过去人们使用一般的二次规划求解器.现在使用了诸如 SMO 等专门方法.

In the past people used general Quadratic Programming solvers. Nowadays specialized approaches like SMO and others are used.

sklearn 的专用 SVM 优化器基于 liblinearlibsvm.如果你对算法感兴趣,这里有很多文档和研究论文.

sklearn's specialized SVM-optimizers are based on liblinear and libsvm. There are many documents and research papers if you are interested in the algorithms.

请记住,SVC (libsvm) 和 LinearSVC (liblinear) 对优化问题做出了不同的假设,这导致在同一任务上的性能不同(线性内核:LinearSVC 通常比 SVC 效率更高); 但有些任务不能被 LinearSVC 处理).

Keep in mind, that SVC (libsvm) and LinearSVC (liblinear) make different assumptions in regards to the optimization-problem, which results in different performances on the same task (linear-kernel: LinearSVC much more efficient than SVC in general; but some tasks can't be tackled by LinearSVC).

SGD 是一个基于随机梯度下降(这是一种通用的优化方法!)优化器它可以优化许多不同的凸优化问题(实际上:这或多或少与所有这些深度学习方法中使用的方法相同;因此人们也在非凸设置中使用它;扔掉理论保证).

SGD is an Stochastic Gradient Descent-based (this is a general optimization method!) optimizer which can optimize many different convex-optimization problems (actually: this is more or less the same method used in all those Deep-Learning approaches; so people use it in the non-convex setting too; throwing away theoretical-guarantees).

sklearn 说:随机梯度下降 (SGD) 是一种简单但非常有效的方法,用于在凸损失函数下对线性分类器进行判别学习.现在它实际上更加通用了,但这里足以说明它包含了(一些)SVM、逻辑回归和其他.

sklearn says: Stochastic Gradient Descent (SGD) is a simple yet very efficient approach to discriminative learning of linear classifiers under convex loss functions. Now it's actually even more versatile, but here it's enough to note that it subsumes (some) SVMs, logistic-regression and others.

现在基于 SGD 的优化与 QP 等非常不同.如果以 QP 为例,没有要调整的超参数.这有点简化,因为可以进行调整,但不需要保证收敛和性能!(QP-solvers 的理论,例如 Interior-point method 更加健壮)

Now SGD-based optimization is very different from QP and others. If one would take QP for example, there are no hyper-parameters to tune. This is a bit simplified, as there can be tuning, but it's not needed to guarantee convergence and performance! (theory of QP-solvers, e.g. Interior-point method is much more robust)

基于 SGD 的优化器(或一般的一阶方法)非常难以调整!他们需要调整!一般而言,学习率或学习计划是要考虑的参数,因为收敛取决于这些(理论和实践)!

SGD-based optimizers (or general first-order methods) are very very hard to tune! And they need tuning! Learning-rates or learning-schedules in general are parameters to look at as convergence depends on these (theory and practice)!

这是一个非常复杂的话题,但有一些简化的规则:

It's a very complex topic, but some simplified rules:

  • 专门的 SVM 方法

  • Specialized SVM-methods

  • 随着样本数量的增加而恶化
  • 不需要超参数调整

基于 SGD 的方法

  • 一般来说可以更好地扩展海量数据
  • 需要超参数调整
  • 仅解决上述可处理任务的一个子集(无内核方法!)

我的意见:考虑到您的时间预算,使用(更容易使用的)LinearSVC,只要它工作正常!

My opinion: use (the easier to use) LinearSVC as long as it's working, given your time-budget!

只是为了说明:我强烈建议抓取一些数据集(例如从 sklearn 中)并在这些候选者之间进行一些比较.参数调整的需要不是理论上的问题!您将很容易在 SGD 案例中看到非最佳(目标/损失)结果!

Just to make it clear: i highly recommend grabbing some dataset (e.g. from within sklearn) and do some comparisons between those candidates. The need for param-tuning is not a theoretical-problem! You will see non-optimal (objective / loss) results in the SGD-case quite easily!

永远记住:随机梯度下降对特征缩放很敏感 文档.这或多或少是一阶方法的结果.

And always remember: Stochastic Gradient Descent is sensitive to feature scaling docs. This is more or less a consequence of first-order methods.

这篇关于scikit-learn:SVC 和 SGD 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆