Python-Scikit.使用 SVM 训练和测试数据 [英] Python-Scikit. Training and testing data using SVM

查看:87
本文介绍了Python-Scikit.使用 SVM 训练和测试数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 SVM (scikit) 训练和测试数据.我正在训练 SVM 并从中准备泡菜.然后,我使用那个泡菜来测试我的系统.首先,我分别读取变量 train_datatest_data 中的训练数据和测试数据.

I am working on training and testing of data using SVM (scikit). I am training SVM and preparing a pickle from it. Then, I am using that pickle to test my system. First I am reading the training data and testing data in variables train_data and test_data respectively.

之后,我用于训练的代码是:

After that, the code I am using for training is:

vectorizer = TfidfVectorizer(max_df = 0.8,
                             sublinear_tf=True,
                             use_idf=True)
train_vectors = vectorizer.fit_transform(train_data)
test_vectors = vectorizer.transform(test_data)

classifier_rbf = svm.SVC()
classifier_rbf.fit(train_vectors, train_labels)
from sklearn.externals import joblib
joblib.dump(classifier_rbf, 'pickl/train_rbf_SVM.pkl',1)

再次在测试时,我分别读取变量 train_datatest_data 中的训练数据和测试数据.我用于测试的代码是:

Again while testing, I am reading the training data and testing data in variables train_data and test_data respectively. The code I am using for testing is:

vectorizer = TfidfVectorizer(max_df = 0.8,
                             sublinear_tf=True,
                             use_idf=True)
train_vectors = vectorizer.fit_transform(train_data)
test_vectors = vectorizer.transform(test_data)
from sklearn.externals import joblib
classifier_rbf = joblib.load('pickl/train_rbf_SVM.pkl')
prediction_rbf = classifier_rbf.predict(test_vectors)

此代码运行良好,并为我提供了正确的输出.我的问题是 - 每当我想进行测试时都必须阅读训练数据吗?

This code is working fine and giving me correct output. My question is - is it compulsory to read training data whenever I want to do testing?

谢谢.

推荐答案

就您而言,是的.因为您没有保存(酸洗)tfidfVectorizer.必须以与转换训练数据完全相同的方式转换测试数据,以给出任何有意义的预测.因此,如果您不想一次又一次地读取训练数据,请将 tfidfVectorizer 与一些估计器一起腌制,并在测试期间取消选取.

In your case, yes. Because you are not saving (pickling) the tfidfVectorizer. The test data must be transformed in the exact same way as the train data is transformed to give any meanungful predictions. So, if you want to not read train data again and again, pickle the tfidfVectorizer too along with some estimator and unpicke it during testing.

此外,您可能还想查看 scikit-learn 中提供的 Pipeline 将数据预处理和估计合并到一个对象中,您可以轻松地对其进行pickle 和 unpicke,而不必担心pickle 和加载各种部分培训

Also you may want to look at the Pipeline provided in scikit-learn to combine data pre processing and estimating into one object which you can pickle and unpicke easily without having to worry about pickling and loading various parts of the training

编辑 - 添加代码

在第一次训练时,最后将这一行添加到您的代码中:

While training for the first time, add this line to your code in the end:

joblib.dump(vectorizer, 'pickl/train_vectorizer.pkl',1)

现在在对数据进行测试时,无需加载训练数据.只需加载已经安装好的矢量化器:

Now when testing on the data, no need to load training data. Just load the already fitted vectorizer:

classifier_rbf = joblib.load('pickl/train_rbf_SVM.pkl')
vectorizer = joblib.load('pickl/train_vectorizer.pkl')

test_vectors = vectorizer.transform(test_data)
prediction_rbf = classifier_rbf.predict(test_vectors)

这篇关于Python-Scikit.使用 SVM 训练和测试数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆