sklearn中的SVM是否支持增量(在线)学习? [英] Does the SVM in sklearn support incremental (online) learning?

查看:297
本文介绍了sklearn中的SVM是否支持增量(在线)学习?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在设计文本文章的推荐系统(有趣"或不有趣"的二进制情况).我的规范之一是,它应该不断更新以适应不断变化的趋势.

I am currently in the process of designing a recommender system for text articles (a binary case of 'interesting' or 'not interesting'). One of my specifications is that it should continuously update to changing trends.

据我所知,最好的方法是利用支持增量/在线学习.

From what I can tell, the best way to do this is to make use of machine learning algorithm that supports incremental/online learning.

Perceptron和Winnow之类的算法支持在线学习,但我对Support Vector Machines并不完全确定. scikit-learn python库是否支持在线学习,如果支持,支持向量机是可以利用在线学习的算法之一吗?

Algorithms like the Perceptron and Winnow support online learning but I am not completely certain about Support Vector Machines. Does the scikit-learn python library support online learning and if so, is a support vector machine one of the algorithms that can make use of it?

我显然并不完全依赖于支持向量机,但是由于它们的全面性能,它们通常是二进制分类算法.我愿意最终改变为最合适的方式.

I am obviously not completely tied down to using support vector machines, but they are usually the go to algorithm for binary classification due to their all round performance. I would be willing to change to whatever fits best in the end.

推荐答案

虽然确实存在用于SVM的在线算法,但是指定要内核还是线性SVM变得很重要,因为已经针对特殊情况开发了许多有效的算法.线性SVM.

While online algorithms for SVMs do exist, it has become important to specify if you want kernel or linear SVMs, as many efficient algorithms have been developed for the special case of linear SVMs.

对于线性情况,如果将scikit-learn中的 SGD分类器与铰链损耗和L2正则化将获得一个可以在线/增量更新的SVM.您可以将其与近似于内核的功能转换结合使用,以类似于在线内核SVM.

For the linear case, if you use the SGD classifier in scikit-learn with the hinge loss and L2 regularization you will get an SVM that can be updated online/incrementall. You can combine this with feature transforms that approximate a kernel to get similar to an online kernel SVM.

我的一项规范是,它应该不断更新以适应不断变化的趋势.

One of my specifications is that it should continuously update to changing trends.

这被称为概念漂移,并且无法通过简单的在线SVM很好地处理.使用PassiveAggresive分类器可能会给您带来更好的结果,因为它的学习率不会随着时间而降低.

This is referred to as concept drift, and will not be handled well by a simple online SVM. Using the PassiveAggresive classifier will likely give you better results, as it's learning rate does not decrease over time.

假设您在训练/跑步时获得了反馈,则可以尝试检测准确性随时间的下降,并在准确性开始下降时开始训练新模型(并在您认为新模型变得更加准确时切换到新模型). ). JSAT 有2种漂移检测方法(请参见jsat.driftdetectors ),可用于跟踪准确性并提醒您当它改变了.

Assuming you get feedback while training / running, you can attempt to detect decreases in accuracy over time and begin training a new model when the accuracy starts to decrease (and switch to the new one when you believe that it has become more accurate). JSAT has 2 drift detection methods (see jsat.driftdetectors) that can be used to track accuracy and alert you when it has changed.

它还具有更多在线线性和核方法.

It also has more online linear and kernel methods.

(偏见:我是JSAT的作者).

(bias note: I'm the author of JSAT).

这篇关于sklearn中的SVM是否支持增量(在线)学习?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆