如何使用MinMaxScaler sklearn归一化训练和测试数据 [英] How to normalize the Train and Test data using MinMaxScaler sklearn
问题描述
所以,我对此有疑问,一直在寻找答案.所以问题是我何时使用
So, I have this doubt and have been looking for answers. So the question is when I use,
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']})
df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']])
df['C'] = df['C'].apply(lambda x: 0 if x.strip()=='N' else 1)
之后,我将训练和测试模型(A
,B
作为特征,C
作为Label)并获得一些准确性得分.现在我的疑问是,当我必须预测新数据集的标签时会发生什么.说,
After which I will train and test the model (A
,B
as features, C
as Label) and get some accuracy score. Now my doubt is, what happens when I have to predict the label for new set of data. Say,
df = pd.DataFrame({'A':[25,67,24,76,23],'B':[2,54,22,75,19]})
因为当我规范化列时,A
和B
的值将根据新数据而不是将在其上训练模型的数据进行更改.
因此,现在经过下面的数据准备步骤后,我的数据将成为.
Because when I normalize the column the values of A
and B
will be changed according to the new data, not the data which the model will be trained on.
So, now my data after the data preparation step that is as below, will be.
data[['A','B']] = min_max_scaler.fit_transform(data[['A','B']])
A
和B
的值将相对于df[['A','B']]
的Max
和Min
值而改变. df[['A','B']]
的数据准备相对于df[['A','B']]
的Min Max
.
Values of A
and B
will change with respect to the Max
and Min
value of df[['A','B']]
. The data prep of df[['A','B']]
is with respect to Min Max
of df[['A','B']]
.
关于不同数字的数据准备如何有效?我不明白这个预测在这里如何正确.
How can the data preparation be valid with respect to different numbers relate? I don't understand how the prediction will be correct here.
推荐答案
您应该使用training
数据拟合MinMaxScaler
,然后在进行预测之前将定标器应用于testing
数据.
摘要:
You should fit the MinMaxScaler
using the training
data and then apply the scaler on the testing
data before the prediction.
In summary:
- 步骤1:将
scaler
放在TRAINING data
上
- 第2步:使用
scaler
至transform the training data
- 第3步:使用
transformed training data
至fit the predictive model
- 第4步:使用
scaler
至transform the TEST data
- 步骤5:
predict
使用trained model
和transformed TEST data
- Step 1: fit the
scaler
on theTRAINING data
- Step 2: use the
scaler
totransform the training data
- Step 3: use the
transformed training data
tofit the predictive model
- Step 4: use the
scaler
totransform the TEST data
- Step 5:
predict
using thetrained model
and thetransformed TEST data
使用数据的示例:
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
#training data
df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']})
#fit and transform the training data and use them for the model training
df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']])
df['C'] = df['C'].apply(lambda x: 0 if x.strip()=='N' else 1)
#fit the model
model.fit(df['A','B'])
#after the model training on the transformed training data define the testing data df_test
df_test = pd.DataFrame({'A':[25,67,24,76,23],'B':[2,54,22,75,19]})
#before the prediction of the test data, ONLY APPLY the scaler on them
df_test[['A','B']] = min_max_scaler.transform(df_test[['A','B']])
#test the model
y_predicted_from_model = model.predict(df_test['A','B'])
使用虹膜数据的示例:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVC
data = datasets.load_iris()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
model = SVC()
model.fit(X_train_scaled, y_train)
X_test_scaled = scaler.transform(X_test)
y_pred = model.predict(X_test_scaled)
希望这会有所帮助.
这篇关于如何使用MinMaxScaler sklearn归一化训练和测试数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!