如何更新逻辑回归模型? [英] How to update Logistic Regression Model?

查看:165
本文介绍了如何更新逻辑回归模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经训练了逻辑回归模型.现在,我必须使用一组新的训练数据来更新(部分拟合)模型.有可能吗?

I have trained a logistic regression model. Now I have to update(partial fit) the model with new set of training data. Is it possible ?

推荐答案

不能 LogisticRegression 上使用 partial_fit .

但是您可以:

  • 使用 warm_start = True ,该代码可重用上一个调用的解决方案以适合初始化,以加快收敛速度​​.
  • SGDClassifier loss ='log'一起使用,这等效于 LogisticRegression ,并且支持 partial_fit .
  • use warm_start=True, which reuse the solution of the previous call to fit as initialization, to speed up convergence.
  • use SGDClassifier with loss='log' which is equivalent to LogisticRegression, and which supports partial_fit.

请注意 partial_fit warm_start 之间的区别.两种方法都从先前的模型开始并进行更新,但是 partial_fit 仅对模型进行了一点更新,而 warm_start 一直在新的训练数据上收敛,而忘记了先前的模型.模型. warm_start 仅用于加速收敛.

Note the difference between partial_fit and warm_start. Both methods starts from the previous model and update it, but partial_fit updates only slightly the model, while warm_start goes all the way to convergence on the new training data, forgetting the previous model. warm_start is only used to speed up convergence.

另请参见词汇表:

warm_start

warm_start

在同一数据集上反复拟合估算器时,但是对于多个参数值(例如在网格搜索中找到使性能最大化的值),可以重用从先前参数值中学到的模型的各个方面,节约时间.当 warm_start 为true时,现有的拟合模型属性an用于在随后对 fit 的调用中初始化新模型.

When fitting an estimator repeatedly on the same dataset, but for multiple parameter values (such as to find the value maximizing performance as in grid search), it may be possible to reuse aspects of the model learnt from the previous parameter value, saving time. When warm_start is true, the existing fitted model attributes an are used to initialise the new model in a subsequent call to fit.

请注意,这仅适用于某些模型和某些参数,甚至某些数量的参数值.例如,在构建随机森林时,可以使用 warm_start 将更多的树添加到森林中(增加 n_estimators ),但不减少其数量.

Note that this is only applicable for some models and some parameters, and even some orders of parameter values. For example, warm_start may be used when building random forests to add more trees to the forest (increasing n_estimators) but not to reduce their number.

partial_fit 也保留了两次调用之间的模型,但是有所不同:对于 warm_start ,参数会发生变化,并且在每次调用期间数据(或多或少)是常数,以适应;使用 partial_fit ,数据更改和模型参数的最小批量保持固定.

partial_fit also retains the model between calls, but differs: with warm_start the parameters change and the data is (more-or-less) constant across calls to fit; with partial_fit, the mini-batch of data changes and model parameters stay fixed.

在某些情况下,您想使用 warm_start 适应不同但密切相关的数据.例如,最初可能适合数据的子集,然后对整个数据集上的参数搜索进行微调.为了进行分类,对 fit 进行的 warm_start 调用序列中的所有数据都必须包括每个类的样本.

There are cases where you want to use warm_start to fit on different, but closely related data. For example, one may initially fit to a subset of the data, then fine-tune the parameter search on the full dataset. For classification, all data in a sequence of warm_start calls to fit must include samples from each class.

__

partial_fit

partial_fit

便于以在线方式拟合估算器.与 fit 不同,重复调用 partial_fit 不会清除模型,而是会针对提供的数据进行更新.提供给 partial_fit 的那部分数据可以称为迷你批处理.每个小批量产品必须具有一致的形状,等等.

Facilitates fitting an estimator in an online fashion. Unlike fit, repeatedly calling partial_fit does not clear the model, but updates it with respect to the data provided. The portion of data provided to partial_fit may be called a mini-batch. Each mini-batch must be of consistent shape, etc.

partial_fit 也可以用于核心学习,尽管通常限于可以在线进行学习的情况,即模型在每个partial_fit之后可用,没有单独的处理需要完成模型. cluster.Birch 引入了约定,即调用 partial_fit(X)会生成未最终确定的模型,但是可以通过调用 partial_fit()<,即不通过进一步的小批量生产.

partial_fit may also be used for out-of-core learning, although usually limited to the case where learning can be performed online, i.e. the model is usable after each partial_fit and there is no separate processing needed to finalize the model. cluster.Birch introduces the convention that calling partial_fit(X) will produce a model that is not finalized, but the model can be finalized by calling partial_fit() i.e. without passing a further mini-batch.

通常,不应在调用 partial_fit 的过程中修改估算器参数,尽管 partial_fit 应当验证它们以及新的小批量数据.相比之下, warm_start 用于将具有相同数据但参数不同的相同估计量重复拟合.

Generally, estimator parameters should not be modified between calls to partial_fit, although partial_fit should validate them as well as the new mini-batch of data. In contrast, warm_start is used to repeatedly fit the same estimator with the same data but varying parameters.

这篇关于如何更新逻辑回归模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆