在Python SGDClassifier中保存partial_fit的多个实例之间的进度 [英] Save progress between multiple instances of partial_fit in Python SGDClassifier
问题描述
我已成功遵循此示例用于我自己的文本分类脚本.
I've successfully followed this example for my own text classification script.
问题在于,我不希望像部分示例那样在part_fit调用循环中处理大量但已存在的数据集.我希望能够在数据可用时添加数据,即使与此同时我关闭了python脚本也是如此.
The problem is I'm not looking to process pieces of a huge, but existing data set in a loop of partial_fit calls, like they do in the example. I want to be able to add data as it becomes available, even if I shut down my python script in the meantime.
理想情况下,我想执行以下操作:
Ideally I'd like to do something like this:
在2015年的某个时候:
sometime in 2015:
model2015 = partial_fit(数据集2015)
model2015=partial_fit(dataset2015)
save_to_file(model2015)
save_to_file(model2015)
关闭我的python脚本
shut down my python script
在2016年的某个时候:
sometime in 2016:
再次打开我的python脚本
open my python script again
load_from_file(model2015)
load_from_file(model2015)
partial_fit(将dataset2016合并为model2015)
partial_fit(dataset2016 incorporating model2015)
save_to_file(model2016)
save_to_file(model2016)
在2017年的某个时候:
sometime in 2017:
再次打开我的python脚本
open my python script again
等...
在scikit-learn中有什么方法可以做到这一点吗?还是其他包装(也许是Tensorflow)?
Is there any way I can do this in scikit-learn? Or in some other package (Tensorflow perhaps)?
推荐答案
只需对模型进行酸洗并将其保存到磁盘.另一种方法是转储.coef_和.intercept_字段(仅是两个数组),并在调用
Simply pickle your model and save it to disk. The other way is to dump .coef_ and .intercept_ fields (which is just two arrays) and use them as initializers when you call .fit
这篇关于在Python SGDClassifier中保存partial_fit的多个实例之间的进度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!