如何使用 MinMaxScaler sklearn 规范化训练和测试数据 [英] How to normalize the Train and Test data using MinMaxScaler sklearn
问题描述
所以,我有这个疑问并一直在寻找答案.所以问题是当我使用时,
So, I have this doubt and have been looking for answers. So the question is when I use,
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']})
df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']])
df['C'] = df['C'].apply(lambda x: 0 if x.strip()=='N' else 1)
之后我将训练和测试模型(A
,B
作为特征,C
作为标签)并获得一些准确度分数.现在我的疑问是,当我必须预测新数据集的标签时会发生什么.说,
After which I will train and test the model (A
,B
as features, C
as Label) and get some accuracy score. Now my doubt is, what happens when I have to predict the label for new set of data. Say,
df = pd.DataFrame({'A':[25,67,24,76,23],'B':[2,54,22,75,19]})
因为当我对列进行标准化时,A
和 B
的值将根据新数据进行更改,而不是模型将在其上训练的数据.所以,现在我的数据经过如下数据准备步骤后将是.
Because when I normalize the column the values of A
and B
will be changed according to the new data, not the data which the model will be trained on.
So, now my data after the data preparation step that is as below, will be.
data[['A','B']] = min_max_scaler.fit_transform(data[['A','B']])
A
和 B
的值会随着 的
.Max
和 Min
值而变化df[['A','B']]df[['A','B']]
的数据准备是相对于 df[['A','B' 的
.Min Max
]]
Values of A
and B
will change with respect to the Max
and Min
value of df[['A','B']]
. The data prep of df[['A','B']]
is with respect to Min Max
of df[['A','B']]
.
对于不同的数字,数据准备如何有效?我不明白这里的预测如何正确.
How can the data preparation be valid with respect to different numbers relate? I don't understand how the prediction will be correct here.
推荐答案
您应该使用 training
数据拟合 MinMaxScaler
,然后在 上应用缩放器在预测之前测试
数据.
总结:
You should fit the MinMaxScaler
using the training
data and then apply the scaler on the testing
data before the prediction.
In summary:
- 第一步:在
TRAINING data
上拟合 - 第2步:使用
scaler
对训练数据进行转换
- 第 3 步:使用
转换后的训练数据
来拟合预测模型
- 第四步:使用
scaler
对TEST数据进行转换
- 第 5 步:使用
训练模型
(第 3 步)和转换后的 TEST 数据
(第 4 步)预测
.
scaler
- Step 1: fit the
scaler
on theTRAINING data
- Step 2: use the
scaler
totransform the TRAINING data
- Step 3: use the
transformed training data
tofit the predictive model
- Step 4: use the
scaler
totransform the TEST data
- Step 5:
predict
using thetrained model
(step 3) and thetransformed TEST data
(step 4).
使用您的数据的示例:
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
#training data
df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']})
#fit and transform the training data and use them for the model training
df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']])
df['C'] = df['C'].apply(lambda x: 0 if x.strip()=='N' else 1)
#fit the model
model.fit(df['A','B'])
#after the model training on the transformed training data define the testing data df_test
df_test = pd.DataFrame({'A':[25,67,24,76,23],'B':[2,54,22,75,19]})
#before the prediction of the test data, ONLY APPLY the scaler on them
df_test[['A','B']] = min_max_scaler.transform(df_test[['A','B']])
#test the model
y_predicted_from_model = model.predict(df_test['A','B'])
使用虹膜数据的示例:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVC
data = datasets.load_iris()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
model = SVC()
model.fit(X_train_scaled, y_train)
X_test_scaled = scaler.transform(X_test)
y_pred = model.predict(X_test_scaled)
希望这会有所帮助.
这篇关于如何使用 MinMaxScaler sklearn 规范化训练和测试数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!