"warm_start"参数及其对计算时间的影响 [英] `warm_start` Parameter And Its Impact On Computational Time

查看:126
本文介绍了"warm_start"参数及其对计算时间的影响的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个后勤回归模型,一组定义的参数(warm_start=True).

I have a logistic regression model with a defined set of parameters (warm_start=True).

和往常一样,我呼叫LogisticRegression.fit(X_train, y_train),然后使用该模型预测新的结果.

As always, I call LogisticRegression.fit(X_train, y_train) and use the model after to predict new outcomes.

假设我更改了某些参数,例如C=100,并使用相同的训练数据再次调用.fit方法.

Suppose I alter some parameters, say, C=100 and call .fit method again using the same training data.

从理论上讲,我认为第二次.fit与使用warm_start=False的模型相比,应该花费更少的计算时间.但是,根据经验,这实际上是不正确的.

Theoretically, for the second time, I think .fit should take less computational time as compared to the model with warm_start=False. However, empirically is not actually true.

请帮我理解warm_start参数的概念.

Please, help me understand the concept of warm_start parameter.

P.S .:我还实施了SGDClassifier()进行实验.

推荐答案

我希望您了解使用先前解决方案作为warm_start=True的后续初始化的概念.

I hope you understand the concept of using the previous solution as an initialization for the following fit with warm_start=True.

文档指出warm_start参数对于 liblinear 求解器来说是没有用的,因为对于特殊的线性情况没有有效的实现.另外, liblinear 解算器是LogisticRegression的默认选择,这基本上意味着权重将在每次新拟合之前完全重新设定.

Documentation states that warm_start parameter is useless with liblinear solver as there is no working implementation for a special linear case. To add, liblinear solver is a default choice for LogisticRegression which basically means that weights will be completely reinstantiated before each new fit.

要利用warm_start参数并减少计算时间,您应该为LogisticRegression使用以下求解器之一:

To utilize warm_start parameter and reduce the computational time you should use one of the following solvers for your LogisticRegression:

  • newton-cg lbfgs ,并支持L2规范惩罚.通常,它们在遇到多分类问题时也更好;
  • sag saga ,它们在大型数据集上的收敛速度比 liblinear 解算器快,并且在下降过程中使用多项式损失.
  • newton-cg or lbfgs with a support of L2-norm penalty. They are also usually better with multiclassification problems;
  • sag or saga which converge faster on larger datasets than liblinear solver and use multinomial loss during descent.
from sklearn.linear_model import LogisticRegression

X = [[1, 2, 3], [4, 5, 6], [1, 2, 3]]
y = [1, 0, 1]

# warm_start would work fine before each new fit
clf = LogisticRegression(solver='sag', warm_start=True)

clf.fit(X, y)

我希望能帮上忙.

这篇关于"warm_start"参数及其对计算时间的影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆