sklearn 中的 SVM 是否支持增量(在线)学习? [英] Does the SVM in sklearn support incremental (online) learning?

查看:36
本文介绍了sklearn 中的 SVM 是否支持增量(在线)学习?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在为文本文章设计推荐系统(有趣"或不有趣"的二元案例).我的规范之一是它应该不断更新以适应不断变化的趋势.

I am currently in the process of designing a recommender system for text articles (a binary case of 'interesting' or 'not interesting'). One of my specifications is that it should continuously update to changing trends.

据我所知,最好的方法是使用支持增量/在线学习.

From what I can tell, the best way to do this is to make use of machine learning algorithm that supports incremental/online learning.

Perceptron 和 Winnow 等算法支持在线学习,但我对支持向量机并不完全确定.scikit-learn python 库是否支持在线学习,如果支持,支持向量机是可以利用它的算法之一吗?

Algorithms like the Perceptron and Winnow support online learning but I am not completely certain about Support Vector Machines. Does the scikit-learn python library support online learning and if so, is a support vector machine one of the algorithms that can make use of it?

我显然并没有完全依赖于使用支持向量机,但由于它们的全面性能,它们通常是二进制分类的首选算法.最后我愿意改成最合适的.

I am obviously not completely tied down to using support vector machines, but they are usually the go to algorithm for binary classification due to their all round performance. I would be willing to change to whatever fits best in the end.

推荐答案

虽然 SVM 的在线算法确实存在,但指定是否需要内核或线性 SVM 变得很重要,因为已经为特殊情况开发了许多有效的算法线性 SVM.

While online algorithms for SVMs do exist, it has become important to specify if you want kernel or linear SVMs, as many efficient algorithms have been developed for the special case of linear SVMs.

对于线性情况,如果您使用 scikit-learn 中的 SGD 分类器铰链损失和 L2 正则化您将获得一个可以在线/增量更新的 SVM.您可以将其与近似内核的特征变换结合起来,以获得类似于在线内核 SVM 的效果.

For the linear case, if you use the SGD classifier in scikit-learn with the hinge loss and L2 regularization you will get an SVM that can be updated online/incrementall. You can combine this with feature transforms that approximate a kernel to get similar to an online kernel SVM.

我的一个规范是它应该不断更新以适应不断变化的趋势.

One of my specifications is that it should continuously update to changing trends.

这被称为概念漂移,简单的在线 SVM 无法很好地处理.使用 PassiveAggressive 分类器可能会给你更好的结果,因为它的学习率不会随着时间的推移而降低.

This is referred to as concept drift, and will not be handled well by a simple online SVM. Using the PassiveAggresive classifier will likely give you better results, as it's learning rate does not decrease over time.

假设您在训练/跑步时获得反馈,您可以尝试检测准确度随时间的下降,并在准确度开始下降时开始训练新模型(并在您认为它变得更加准确时切换到新模型).JSAT 有 2 种漂移检测方法(请参阅 jsat.driftdetectors) 可用于跟踪准确性并提醒您当它发生变化时.

Assuming you get feedback while training / running, you can attempt to detect decreases in accuracy over time and begin training a new model when the accuracy starts to decrease (and switch to the new one when you believe that it has become more accurate). JSAT has 2 drift detection methods (see jsat.driftdetectors) that can be used to track accuracy and alert you when it has changed.

它还有更多的在线线性和核方法.

It also has more online linear and kernel methods.

(偏见说明:我是 JSAT 的作者).

(bias note: I'm the author of JSAT).

这篇关于sklearn 中的 SVM 是否支持增量(在线)学习?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆