更新Python Pickle对象 [英] Updating Python Pickle Object

查看:237
本文介绍了更新Python Pickle对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个机器学习项目,为此我正在使用Python的pickle模块.

I am doing a project in Machine Learning and for that I am using the pickle module of Python.

基本上,我正在解析一个巨大的数据集,这在一次执行中是不可能的,这就是为什么我需要保存分类器对象并在下一次执行中对其进行更新.

Basically, I am parsing through a huge data set which is not possible in one execution that is why I need to save the classifier object and update it in the next execution.

所以我的问题是,当我使用新数据集再次运行该程序时,那么将修改(或更新)已创建的pickle对象.如果不是,那么每次运行程序时如何更新相同的泡菜对象.

So my question is, when I run the program again with the new data set then will the already created pickle object be modified (or updated). If not then how can I update the same pickle object every time I run the program.

save_classifier = open("naivebayes.pickle","wb")
pickle.dump(classifier,save_classifier)
save_classifier.close()

推荐答案

取消腌制classifier对象将以与腌制时相同的状态重新创建它,因此您可以继续使用新数据更新它从您的数据集中.并且在程序运行结束时,再次对classifier进行腌制并将其再次保存到文件中.最好不要覆盖相同的文件,而是保留一个备份(或者更好的是一系列备份),以防万一您搞砸了.这样,您可以轻松地回到classifier的已知良好状态.

Unpickling your classifier object will re-create it in the same state that it was when you pickled it, so you can proceed to update it with fresh data from your data set. And at the end of the program run, you pickle the classifier again and save it to a file again. It's a Good Idea to not overwrite the same file but to keep a backup (or even better, a series of backups), in case you mess something up. That way, you can easily go back to a known good state of your classifier.

您应该尝试使用一个简单的程序和一个简单的对象来进行酸洗和酸洗,直到您完全对这一切的工作方式充满信心为止.

You should experiment with pickling, using a simple program and a simple object to pickle and unpickle, until you're totally confident with how this all works.

这是如何更新腌制的classifier数据的粗略草图.

Here's a rough sketch of how to update the pickled classifier data.

import pickle
import os
from os.path import exists
# other imports required for nltk ...

picklename = "naivebayes.pickle"

# stuff to set up featuresets ...

featuresets = [(find_features(rev), category) for (rev, category) in documents]
numtrain = int(len(documents) * 90 / 100)
training_set = featuresets[:numtrain]
testing_set = featuresets[numtrain:]

# Load or create a classifier and apply training set to it
if exists(picklename):
    # Update existing classifier
    with open(picklename, "rb") as f:
        classifier = pickle.load(f)
    classifier.train(training_set)
else:
    # Create a brand new classifier    
    classifier = nltk.NaiveBayesClassifier.train(training_set)

# Create backup
if exists(picklename):
    backupname = picklename + '.bak'
    if exists(backupname):
        os.remove(backupname)
    os.rename(picklename, backupname)

# Save
with open(picklename, "wb") as f:
    pickle.dump(classifier, f)

首次运行该程序时,它将创建一个新的classifier,并使用training_set中的数据对其进行训练,然后对classifier进行腌制,使其成为"naivebayes.pickle".以后每次运行该程序时,它将加载旧的classifier并对其应用更多的训练数据.

The first time you run this program it will create a new classifier, train it with the data in training_set, then pickle classifier to "naivebayes.pickle". Each subsequent time you run this program it will load the old classifier and apply more training data to it.

顺便说一句,如果您是在Python 2中执行此操作,则应该使用快得多的cPickle模块;您可以通过替换

BTW, if you are doing this in Python 2 you should use the much faster cPickle module; you can do that by replacing

import pickle 

import cPickle as pickle

这篇关于更新Python Pickle对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆