如何在sklearn的管道中腌制单个步骤? [英] How to pickle individual steps in sklearn's Pipeline?

查看:100
本文介绍了如何在sklearn的管道中腌制单个步骤?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用sklearn中的Pipeline对文本进行分类.

I am using Pipeline from sklearn to classify text.

在此示例Pipeline中,我有一个TfidfVectorizer以及一些用FeatureUnion和一个分类器包装的自定义功能,作为Pipeline步骤,然后我拟合训练数据并进行预测:

In this example Pipeline, I have a TfidfVectorizer and some custom features wrapped with FeatureUnion and a classifier as the Pipeline steps, I then fit the training data and do the prediction:

from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC

X = ['I am a sentence', 'an example']
Y = [1, 2]
X_dev = ['another sentence']

# classifier
LinearSVC1 = LinearSVC(tol=1e-4,  C = 0.10000000000000001)

pipeline = Pipeline([
    ('features', FeatureUnion([
       ('tfidf', TfidfVectorizer(ngram_range=(1, 3), max_features= 4000)), 
       ('custom_features', CustomFeatures())])),
    ('clf', LinearSVC1),
    ])

pipeline.fit(X, Y)
y_pred = pipeline.predict(X_dev)

# etc.

在这里,我需要腌制TfidfVectorizer步骤,并保持未腌制的状态custom_features,因为我仍在对它们进行实验.这个想法是通过腌制tfidf步骤来加快管道的速度.

Here I need to pickle the TfidfVectorizer step and leave the custom_features unpickled, since I still do experiments with them. The idea is to make the pipeline faster by pickling the tfidf step.

我知道我可以用joblib.dump腌制整个Pipeline,但是如何腌制各个步骤?

I know I can pickle the whole Pipeline with joblib.dump, but how do I pickle individual steps?

推荐答案

要腌制TfidfVectorizer,可以使用:

To pickle the TfidfVectorizer, you could use:

joblib.dump(pipeline.steps[0][1].transformer_list[0][1], dump_path)

或:

joblib.dump(pipeline.get_params()['features__tfidf'], dump_path)

要加载转储的对象,可以使用:

To load the dumped object, you can use:

pipeline.steps[0][1].transformer_list[0][1] = joblib.load(dump_path)

不幸的是,您不能使用set_params(get_params的反函数)来按名称插入估算器.如果 PR#1769中的更改,您将能够:启用将管道组件设置为参数曾经被合并!

Unfortunately you can't use set_params, the inverse of get_params, to insert the estimator by name. You will be able to if the changes in PR#1769: enable setting pipeline components as parameters are ever merged!

这篇关于如何在sklearn的管道中腌制单个步骤?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆