如何使用新数据更新SVM模型 [英] How to update an SVM model with new data

查看:1018
本文介绍了如何使用新数据更新SVM模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个不同大小的数据集。

I have two data set with different size.

1)数据集1是高维度的4500个样本(草图)。

1) Data set 1 is with high dimensions 4500 samples (sketches).

2)数据集2具有低维度1000个样本(实际数据)。
我想两个数据集具有相同的分布

2) Data set 2 is with low dimension 1000 samples (real data). I suppose that "both data set have the same distribution"

我想训练一个非线性 SVM sklearn 在第一个数据集上(作为预训练),然后我想更新第二个数据集的一部分上的模型拟合模型)。
如何开发sklearn的一种更新。如何更新SVM模型?

I want to train an non linear SVM model using sklearn on the first data set (as a pre-training ), and after that I want to update the model on a part of the second data set (to fit the model). How can I develop a kind of update on sklearn. How can I update a SVM model?

推荐答案

在sklearn中,你只能对线性内核使用 SGDClassifier (适当选择损失/惩罚项,损失应为铰链,惩罚为L2)。通过 partial_fit 方法支持增量学习,这不是既不实现 SVC 也不实现 LinearSVC

In sklearn you can do this only for linear kernel and using SGDClassifier (with appropiate selection of loss/penalty terms, loss should be hinge, and penalty L2). Incremental learning is supported through partial_fit methods, and this is not implemented for neither SVC nor LinearSVC.

不幸的是,在实践中,为这种小数据集以增量方式拟合SVM是没有用的。 SVM很容易获得全局解决方案,因此您不需要任何形式的预训练,事实上它应该不重要考虑神经网络意义上的预训练。如果正确实现,SVM应该完全忘记以前的数据集。为什么不一次学习整个数据?这是SVM应该做的。除非你正在使用一些非凸的修改SVM(然后pretraining有意义)。

Unfortunately, in practise fitting SVM in incremental fashion for such small datasets is rather useless. SVM has easy obtainable global solution, thus you do not need pretraining of any form, in fact it should not matter at all, if you are thinking about pretraining in the neural network sense. If correctly implemented, SVM should completely forget previous dataset. Why not learn on the whole data in one pass? This is what SVM is supposed to do. Unless you are working with some non-convex modification of SVM (then pretraining makes sense).

总结:


  • 从理论和实践的角度来看,预训练SVM没有意义。您可以仅学习第二个数据集,也可以同时学习两个数据集。预训练仅适用于受到局部最小值(或任何类型的硬收敛)的方法,因此需要在实际解法附近开始,以便能够找到合理的模型(如神经网络)。

  • 您可以使用增量拟合(虽然在sklearn它是非常有限的)效率的原因,但对于这样小的数据集,你会

  • From theoretical and practical point of view there is no point in pretraining SVM. You can either learn only on the second dataset, or on both in the same time. Pretraining is only reasonable for methods which suffer from local minima (or hard convergence of any kind) thus need to start near actual solution to be able to find reasonable model (like neural networks). SVM is not one of them.
  • You can use incremental fitting (although in sklearn it is very limited) for efficiency reasons, but for such small dataset you will be just fine fitting whole dataset at once.

这篇关于如何使用新数据更新SVM模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆