腌制训练有素的分类器会产生与直接从新近但训练有素的分类器直接获得的结果不同的结果 [英] Pickling a trained classifier yields different results from the results obtained directly from a newly but identically trained classifier

查看:99
本文介绍了腌制训练有素的分类器会产生与直接从新近但训练有素的分类器直接获得的结果不同的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从Scikit-learn库中腌制经过训练的SVM分类器,这样我就不必一遍又一遍地训练它. 但是,当我将测试数据传递给从泡菜加载的分类器时,我获得了异常高的准确性,f度量等值. 如果将测试数据直接传递给未腌制的分类器,则其值会低得多.我不明白为什么腌制和拆开分类器对象会改变其行为方式.有人可以帮我这个忙吗?

I'm trying to pickle a trained SVM classifier from the Scikit-learn library so that I don't have to train it over and over again. But when I pass the test data to the classifier loaded from the pickle, I get unusually high values for accuracy, f measure, etc. If the test data is passed directly to the classifier which is not pickled, it gives much lower values. I don't understand why pickling and unpickling the classifier object is changing the way it behaves. Can someone please help me out with this?

我正在做这样的事情:

from sklearn.externals import joblib
joblib.dump(grid, 'grid_trained.pkl')

在这里,grid是训练有素的分类器对象.当我将其拨开时,它的行为与直接使用时的行为有很大不同.

Here, grid is the trained classifier object. When I unpickle it, it acts very different from when it is directly used.

推荐答案

@AndreasMueller所说的应该没有任何区别,这是来自

There should not be any difference as @AndreasMueller stated, here's a modified example from http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html#loading-the-20-newgroups-dataset using pickle:

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB

# Set labels and data
categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med']
twenty_train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42)

# Vectorize data
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(twenty_train.data)

# TF-IDF transformation
tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts)
X_train_tf = tf_transformer.transform(X_train_counts)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)

# Train classifier
clf = MultinomialNB().fit(X_train_tfidf, twenty_train.target)

# Tag new data
docs_new = ['God is love', 'OpenGL on the GPU is fast']
X_new_counts = count_vect.transform(docs_new)
X_new_tfidf = tfidf_transformer.transform(X_new_counts)
predicted = clf.predict(X_new_tfidf)

answers = [(doc, twenty_train.target_names[category]) for doc, category in zip(docs_new, predicted)]


# Pickle the classifier
import pickle
with open('clf.pk', 'wb') as fout:
    pickle.dump(clf, fout)

# Let's clear the classifier
clf = None

with open('clf.pk', 'rb') as fin:
    clf = pickle.load(fin)

# Retag new data
docs_new = ['God is love', 'OpenGL on the GPU is fast']
X_new_counts = count_vect.transform(docs_new)
X_new_tfidf = tfidf_transformer.transform(X_new_counts)
predicted = clf.predict(X_new_tfidf)

answers_from_loaded_clf = [(doc, twenty_train.target_names[category]) for doc, category in zip(docs_new, predicted)]

assert answers_from_loaded_clf == answers
print "Answers from freshly trained classifier and loaded pre-trained classifer are the same !!!"

使用sklearn.externals.joblib时也是如此:

# Pickle the classifier
from sklearn.externals import joblib
joblib.dump(clf, 'clf.pk')

# Let's clear the classifier
clf = None

# Loads the pretrained classifier
clf = joblib.load('clf.pk')

# Retag new data
docs_new = ['God is love', 'OpenGL on the GPU is fast']
X_new_counts = count_vect.transform(docs_new)
X_new_tfidf = tfidf_transformer.transform(X_new_counts)
predicted = clf.predict(X_new_tfidf)

answers_from_loaded_clf = [(doc, twenty_train.target_names[category]) for doc, category in zip(docs_new, predicted)]

assert answers_from_loaded_clf == answers
print "Answers from freshly trained classifier and loaded pre-trained classifer are the same !!!"

这篇关于腌制训练有素的分类器会产生与直接从新近但训练有素的分类器直接获得的结果不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆