如何使用MinMaxScaler sklearn归一化训练和测试数据 [英] How to normalize the Train and Test data using MinMaxScaler sklearn

查看:1099
本文介绍了如何使用MinMaxScaler sklearn归一化训练和测试数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我对此有疑问,一直在寻找答案.所以问题是我何时使用

So, I have this doubt and have been looking for answers. So the question is when I use,

from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()

df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']})

df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']])
df['C'] = df['C'].apply(lambda x: 0 if x.strip()=='N' else 1)

之后,我将训练和测试模型(AB作为特征,C作为Label)并获得一些准确性得分.现在我的疑问是,当我必须预测新数据集的标签时会发生什么.说,

After which I will train and test the model (A,B as features, C as Label) and get some accuracy score. Now my doubt is, what happens when I have to predict the label for new set of data. Say,

df = pd.DataFrame({'A':[25,67,24,76,23],'B':[2,54,22,75,19]})

因为当我规范化列时,AB的值将根据新数据而不是将在其上训练模型的数据进行更改. 因此,现在经过下面的数据准备步骤后,我的数据将成为.

Because when I normalize the column the values of A and B will be changed according to the new data, not the data which the model will be trained on. So, now my data after the data preparation step that is as below, will be.

data[['A','B']] = min_max_scaler.fit_transform(data[['A','B']])

AB的值将相对于df[['A','B']]MaxMin值而改变. df[['A','B']]的数据准备相对于df[['A','B']]Min Max.

Values of A and B will change with respect to the Max and Min value of df[['A','B']]. The data prep of df[['A','B']] is with respect to Min Max of df[['A','B']].

关于不同数字的数据准备如何有效?我不明白这个预测在这里如何正确.

How can the data preparation be valid with respect to different numbers relate? I don't understand how the prediction will be correct here.

推荐答案

您应该使用training数据拟合MinMaxScaler,然后在进行预测之前将定标器应用于testing数据.


摘要:

You should fit the MinMaxScaler using the training data and then apply the scaler on the testing data before the prediction.


In summary:

  • 步骤1:将scaler放在TRAINING data
  • 第2步:使用scalertransform the training data
  • 第3步:使用transformed training datafit the predictive model
  • 第4步:使用scalertransform the TEST data
  • 步骤5:predict使用trained modeltransformed TEST data
  • Step 1: fit the scaler on the TRAINING data
  • Step 2: use the scaler to transform the training data
  • Step 3: use the transformed training data to fit the predictive model
  • Step 4: use the scaler to transform the TEST data
  • Step 5: predict using the trained model and the transformed TEST data

使用数据的示例:

from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
#training data
df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']})
#fit and transform the training data and use them for the model training
df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']])
df['C'] = df['C'].apply(lambda x: 0 if x.strip()=='N' else 1)

#fit the model
model.fit(df['A','B'])

#after the model training on the transformed training data define the testing data df_test
df_test = pd.DataFrame({'A':[25,67,24,76,23],'B':[2,54,22,75,19]})

#before the prediction of the test data, ONLY APPLY the scaler on them
df_test[['A','B']] = min_max_scaler.transform(df_test[['A','B']])

#test the model
y_predicted_from_model = model.predict(df_test['A','B'])


使用虹膜数据的示例:

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVC

data = datasets.load_iris()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)

model = SVC()
model.fit(X_train_scaled, y_train)

X_test_scaled = scaler.transform(X_test)
y_pred = model.predict(X_test_scaled)

希望这会有所帮助.

这篇关于如何使用MinMaxScaler sklearn归一化训练和测试数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆