如何使用warm_start [英] How to use warm_start
问题描述
我想使用 warm_start
参数将训练数据添加到我的随机森林分类器.我希望它像这样使用:
I'd like to use the warm_start
parameter to add training data to my random forest classifier. I expected it to be used like this:
clf = RandomForestClassifier(...)
clf.fit(get_data())
clf.fit(get_more_data(), warm_start=True)
但是 warm_start
参数是一个构造函数参数.那我要不要做这样的事情?
But the warm_start
parameter is a constructor parameter. So do I do something like this?
clf = RandomForestClassifier()
clf.fit(get_data())
clf = RandomForestClassifier (warm_start=True)
clf.fit(get_more_data)
这对我来说毫无意义.对构造函数的新调用不会丢弃以前的训练数据吗?我想我错过了一些东西.
That makes no sense to me. Won't the new call to the constructor discard previous training data? I think I'm missing something.
推荐答案
(取自 Miriam 的回答)的基本模式:
The basic pattern of (taken from Miriam's answer):
clf = RandomForestClassifier(warm_start=True)
clf.fit(get_data())
clf.fit(get_more_data())
将是正确的 API 用法.
would be the correct usage API-wise.
但是这里有一个问题.
正如文档所说:
当设置为 True 时,重用之前调用 fit 的解决方案并向集成添加更多估计器,否则,只适合一个全新的森林.
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.
这意味着,warm_start
唯一能为您做的就是添加新的 DecisionTree.之前的所有树木似乎都没有受到影响!
it means, that the only thing warm_start
can do for you, is adding new DecisionTree's. All the previous trees seem to be untouched!
让我们通过一些来源检查一下:
n_more_estimators = self.n_estimators - len(self.estimators_)
if n_more_estimators < 0:
raise ValueError('n_estimators=%d must be larger or equal to '
'len(estimators_)=%d when warm_start==True'
% (self.n_estimators, len(self.estimators_)))
elif n_more_estimators == 0:
warn("Warm-start fitting without increasing n_estimators does not "
"fit new trees.")
这基本上告诉我们,在接近新拟合之前,您需要增加估算器的数量!
This basically tells us, that you would need to increase the number of estimators before approaching a new fit!
我不知道 sklearn 在这里期望什么样的用法.我不确定,如果拟合,增加内部变量并再次拟合是否正确,但我以某种方式怀疑它(特别是因为 n_estimators
不是公共类变量).
I have no idea what kind of usage sklearn expects here. I'm not sure, if fitting, increasing internal variables and fitting again is correct usage, but i somehow doubt it (especially as n_estimators
is not a public class-variable).
你的基本方法(关于这个库和这个分类器)对于你的核外学习来说可能不是一个好主意!我不会进一步追求这个.
Your basic approach (in regards to this library and this classifier) is probably not a good idea for your out-of-core learning here! I would not pursue this further.
这篇关于如何使用warm_start的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!