如何使用 MinMaxScaler sklearn 规范化训练和测试数据 [英] How to normalize the Train and Test data using MinMaxScaler sklearn

查看:40
本文介绍了如何使用 MinMaxScaler sklearn 规范化训练和测试数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我有这个疑问并一直在寻找答案.所以问题是当我使用时,

So, I have this doubt and have been looking for answers. So the question is when I use,

from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()

df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']})

df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']])
df['C'] = df['C'].apply(lambda x: 0 if x.strip()=='N' else 1)

之后我将训练和测试模型(A,B 作为特征,C 作为标签)并获得一些准确度分数.现在我的疑问是,当我必须预测新数据集的标签时会发生什么.说,

After which I will train and test the model (A,B as features, C as Label) and get some accuracy score. Now my doubt is, what happens when I have to predict the label for new set of data. Say,

df = pd.DataFrame({'A':[25,67,24,76,23],'B':[2,54,22,75,19]})

因为当我对列进行标准化时,AB 的值将根据新数据进行更改,而不是模型将在其上训练的数据.所以,现在我的数据经过如下数据准备步骤后将是.

Because when I normalize the column the values of A and B will be changed according to the new data, not the data which the model will be trained on. So, now my data after the data preparation step that is as below, will be.

data[['A','B']] = min_max_scaler.fit_transform(data[['A','B']])

AB 的值会随着 MaxMin 值而变化df[['A','B']].df[['A','B']] 的数据准备是相对于 df[['A','B' 的 Min Max]].

Values of A and B will change with respect to the Max and Min value of df[['A','B']]. The data prep of df[['A','B']] is with respect to Min Max of df[['A','B']].

对于不同的数字,数据准备如何有效?我不明白这里的预测如何正确.

How can the data preparation be valid with respect to different numbers relate? I don't understand how the prediction will be correct here.

推荐答案

您应该使用 training 数据拟合 MinMaxScaler,然后在 上应用缩放器在预测之前测试数据.


总结:

You should fit the MinMaxScaler using the training data and then apply the scaler on the testing data before the prediction.


In summary:

  • 第一步:在TRAINING data
  • 上拟合scaler
  • 第2步:使用scaler对训练数据进行转换
  • 第 3 步:使用转换后的训练数据拟合预测模型
  • 第四步:使用scaler对TEST数据进行转换
  • 第 5 步:使用 训练模型(第 3 步)和 转换后的 TEST 数据(第 4 步)预测.
  • Step 1: fit the scaler on the TRAINING data
  • Step 2: use the scaler to transform the TRAINING data
  • Step 3: use the transformed training data to fit the predictive model
  • Step 4: use the scaler to transform the TEST data
  • Step 5: predict using the trained model (step 3) and the transformed TEST data (step 4).

使用您的数据的示例:

from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
#training data
df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']})
#fit and transform the training data and use them for the model training
df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']])
df['C'] = df['C'].apply(lambda x: 0 if x.strip()=='N' else 1)

#fit the model
model.fit(df['A','B'])

#after the model training on the transformed training data define the testing data df_test
df_test = pd.DataFrame({'A':[25,67,24,76,23],'B':[2,54,22,75,19]})

#before the prediction of the test data, ONLY APPLY the scaler on them
df_test[['A','B']] = min_max_scaler.transform(df_test[['A','B']])

#test the model
y_predicted_from_model = model.predict(df_test['A','B'])


使用虹膜数据的示例:

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVC

data = datasets.load_iris()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)

model = SVC()
model.fit(X_train_scaled, y_train)

X_test_scaled = scaler.transform(X_test)
y_pred = model.predict(X_test_scaled)

希望这会有所帮助.

另请参阅此处的帖子: https://towardsdatascience.com/everything-you-need-to-know-about-min-max-normalization-in-python-b79592732b79

这篇关于如何使用 MinMaxScaler sklearn 规范化训练和测试数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆