加载并预测新数据 [英] Load and predict new data sklearn

查看:82
本文介绍了加载并预测新数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我训练了一个Logistic模型,进行了交叉验证,并使用joblib模块将其保存到文件中.现在,我想加载此模型并使用它预测新数据. 这是正确的方法吗?尤其是标准化.我也应该在新数据上使用scaler.fit()吗?在我遵循的教程中,scaler.fit仅用于训练集,所以我在这里有点迷茫.

I trained a Logistic model, cross-validated and saved it to file using joblib module. Now I want to load this model and predict new data with it. Is this the correct way to do this? Especially the standardization. Should I use scaler.fit() on my new data too? In the tutorials I followed, scaler.fit was only used on the training set, so I'm a bit lost here.

这是我的代码:

#Loading the saved model with joblib
model = joblib.load('model.pkl')

# New data to predict
pr = pd.read_csv('set_to_predict.csv')
pred_cols = list(pr.columns.values)[:-1]

# Standardize new data
scaler = StandardScaler()
X_pred = scaler.fit(pr[pred_cols]).transform(pr[pred_cols])

pred = pd.Series(model.predict(X_pred))
print pred

推荐答案

否,这是不正确的.所有数据准备步骤均应使用火车数据进行拟合.否则,您可能会冒险应用错误的转换,因为StandardScaler估计的均值和方差在训练数据和测试数据之间可能确实有所不同.

No, it's incorrect. All the data preparation steps should be fit using train data. Otherwise, you risk applying the wrong transformations, because means and variances that StandardScaler estimates do probably differ between train and test data.

同时训练,保存,加载和应用所有步骤的最简单方法是使用管道:

The easiest way to train, save, load and apply all the steps simultaneously is to use Pipelines:

在培训中:

# prepare the pipeline
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib

pipe = make_pipeline(StandardScaler(), LogisticRegression)
pipe.fit(X_train, y_train)
joblib.dump(pipe, 'model.pkl')

处于预测状态:

#Loading the saved model with joblib
pipe = joblib.load('model.pkl')

# New data to predict
pr = pd.read_csv('set_to_predict.csv')
pred_cols = list(pr.columns.values)[:-1]

# apply the whole pipeline to data
pred = pd.Series(pipe.predict(pr[pred_cols]))
print pred

这篇关于加载并预测新数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆