如何存储 TfidfVectorizer 以备将来在 scikit-learn 中使用? [英] How do I store a TfidfVectorizer for future use in scikit-learn?

查看：67 发布时间：2021/6/28 19:22:31 python python-3.x scikit-learn tf-idf joblib

本文介绍了如何存储 TfidfVectorizer 以备将来在 scikit-learn 中使用?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 TfidfVectorizer 可以对文章集合进行矢量化，然后进行特征选择.

I have a TfidfVectorizer that vectorizes collection of articles followed by feature selection.

vectroizer = TfidfVectorizer()
X_train = vectroizer.fit_transform(corpus)
selector = SelectKBest(chi2, k = 5000 )
X_train_sel = selector.fit_transform(X_train, y_train)

现在，我想存储它并在其他程序中使用它.我不想在训练数据集上重新运行 TfidfVectorizer() 和特征选择器.我怎么做?我知道如何使用 joblib 使模型持久化，但我想知道这是否与使模型持久化相同.

Now, I want to store this and use it in other programs. I don't want to re-run the TfidfVectorizer() and the feature selector on the training dataset. How do I do that? I know how to make a model persistent using joblib but I wonder if this is the same as making a model persistent.

推荐答案

你可以简单地使用内置的pickle库:

You can simply use the built in pickle library:

import pickle
pickle.dump(vectorizer, open("vectorizer.pickle", "wb"))
pickle.dump(selector, open("selector.pickle", "wb"))

并加载它:

vectorizer = pickle.load(open("vectorizer.pickle", "rb"))
selector = pickle.load(open("selector.pickle", "rb"))

Pickle 会将对象序列化到磁盘并在您需要时再次将它们加载到内存中

Pickle will serialize the objects to disk and load them in memory again when you need it

pickle 库文档

这篇关于如何存储 TfidfVectorizer 以备将来在 scikit-learn 中使用?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何存储 TfidfVectorizer 以备将来在 scikit-learn 中使用? [英] How do I store a TfidfVectorizer for future use in scikit-learn?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何存储 TfidfVectorizer 以备将来在 scikit-learn 中使用? [英] How do I store a TfidfVectorizer for future use in scikit-learn?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭