在Python SGDClassifier中保存partial_fit的多个实例之间的进度 [英] Save progress between multiple instances of partial_fit in Python SGDClassifier

查看:227
本文介绍了在Python SGDClassifier中保存partial_fit的多个实例之间的进度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已成功遵循示例用于我自己的文本分类脚本.

I've successfully followed this example for my own text classification script.

问题在于,我不希望像部分示例那样在part_fit调用循环中处理大量但已存在的数据集.我希望能够在数据可用时添加数据,即使与此同时我关闭了python脚本也是如此.

The problem is I'm not looking to process pieces of a huge, but existing data set in a loop of partial_fit calls, like they do in the example. I want to be able to add data as it becomes available, even if I shut down my python script in the meantime.

理想情况下,我想执行以下操作:

Ideally I'd like to do something like this:

在2015年的某个时候:

sometime in 2015:

model2015 = partial_fit(数据集2015)

model2015=partial_fit(dataset2015)

save_to_file(model2015)

save_to_file(model2015)

关闭我的python脚本

shut down my python script

在2016年的某个时候:

sometime in 2016:

再次打开我的python脚本

open my python script again

load_from_file(model2015)

load_from_file(model2015)

partial_fit(将dataset2016合并为model2015)

partial_fit(dataset2016 incorporating model2015)

save_to_file(model2016)

save_to_file(model2016)

在2017年的某个时候:

sometime in 2017:

再次打开我的python脚本

open my python script again

等...

在scikit-learn中有什么方法可以做到这一点吗?还是其他包装(也许是Tensorflow)?

Is there any way I can do this in scikit-learn? Or in some other package (Tensorflow perhaps)?

推荐答案

只需对模型进行酸洗并将其保存到磁盘.另一种方法是转储.coef_和.intercept_字段(仅是两个数组),并在调用

Simply pickle your model and save it to disk. The other way is to dump .coef_ and .intercept_ fields (which is just two arrays) and use them as initializers when you call .fit

这篇关于在Python SGDClassifier中保存partial_fit的多个实例之间的进度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆