"warm_start"参数及其对计算时间的影响 [英] `warm_start` Parameter And Its Impact On Computational Time
问题描述
我有一个后勤回归模型,一组定义的参数(warm_start=True
).
I have a logistic regression model with a defined set of parameters (warm_start=True
).
和往常一样,我呼叫LogisticRegression.fit(X_train, y_train)
,然后使用该模型预测新的结果.
As always, I call LogisticRegression.fit(X_train, y_train)
and use the model after to predict new outcomes.
假设我更改了某些参数,例如C=100
,并使用相同的训练数据再次调用.fit
方法.
Suppose I alter some parameters, say, C=100
and call .fit
method again using the same training data.
从理论上讲,我认为第二次.fit
与使用warm_start=False
的模型相比,应该花费更少的计算时间.但是,根据经验,这实际上是不正确的.
Theoretically, for the second time, I think .fit
should take less computational time as compared to the model with warm_start=False
. However, empirically is not actually true.
请帮我理解warm_start
参数的概念.
Please, help me understand the concept of warm_start
parameter.
P.S .:我还实施了
SGDClassifier()
进行实验.
推荐答案
我希望您了解使用先前解决方案作为warm_start=True
的后续初始化的概念.
I hope you understand the concept of using the previous solution as an initialization for the following fit with warm_start=True
.
文档指出warm_start
参数对于 liblinear 求解器来说是没有用的,因为对于特殊的线性情况没有有效的实现.另外, liblinear 解算器是LogisticRegression
的默认选择,这基本上意味着权重将在每次新拟合之前完全重新设定.
Documentation states that warm_start
parameter is useless with liblinear solver as there is no working implementation for a special linear case. To add, liblinear solver is a default choice for LogisticRegression
which basically means that weights will be completely reinstantiated before each new fit.
要利用warm_start
参数并减少计算时间,您应该为LogisticRegression
使用以下求解器之一:
To utilize warm_start
parameter and reduce the computational time you should use one of the following solvers for your LogisticRegression
:
- newton-cg 或 lbfgs ,并支持L2规范惩罚.通常,它们在遇到多分类问题时也更好;
- sag 或 saga ,它们在大型数据集上的收敛速度比 liblinear 解算器快,并且在下降过程中使用多项式损失.
- newton-cg or lbfgs with a support of L2-norm penalty. They are also usually better with multiclassification problems;
- sag or saga which converge faster on larger datasets than liblinear solver and use multinomial loss during descent.
from sklearn.linear_model import LogisticRegression
X = [[1, 2, 3], [4, 5, 6], [1, 2, 3]]
y = [1, 0, 1]
# warm_start would work fine before each new fit
clf = LogisticRegression(solver='sag', warm_start=True)
clf.fit(X, y)
我希望能帮上忙.
这篇关于"warm_start"参数及其对计算时间的影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!