在改组和拆分后,我应该分别对培训和测试进行标准化吗? [英] Should I normalize training and test test separately after shuffling and splitting?

查看:82
本文介绍了在改组和拆分后,我应该分别对培训和测试进行标准化吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想规范化[0,1]范围内的数据.在改组和拆分后是否应该对数据进行规范化?是否应该重复相同的测试步骤?我遇到了使用这种标准化类型的python代码.这是使用目标范围[0,1]

I want to normalize my data in the range [0,1]. Should I normalize data after shuffling and splitting?Should I repeat the same procedure for test test? I came across a python code which was using such type of normalization. Is this the correct way to normalize data with target range [0,1]

`X_train = np.array([[ 1., -1.,  2.], [ 2.,  0.,  0.],[ 0.,  1., -1.]])
a= X_train
for i in range(3):
    old_range = np.amax(a[:,i]) - np.amin(a[:,i])
    new_range = 1 - 0
    f = ((a[:,i] - np.amin(a[:,i])) / old_range)*new_range + 0
    lis.append(f)
b = np.transpose(np.array(lis))
print(b)`

这是归一化后的结果.

`[[0.5, 0., 1.]
[1., 0.5, 0.33333333]
[0., 1., 0.]]`

推荐答案

混洗和拆分后是否应该对数据进行规范化?

Should I normalize data after shuffling and splitting?

是的.否则,您将泄漏未来的信息(即在此处进行测试).详细信息此处;它用于标准化,而不是标准化(和R,而不是Python),但是这些参数同样适用.

Yes. Otherwise, you are leaking information from the future (i.e., test here). More information here; it is for standardization, and not normalization, (and R, not Python) but the arguments are equally applicable.

我应该重复相同的测试步骤吗?

Should I repeat the same procedure for test?

是的.使用适合训练数据集的缩放器.在这种情况下,这意味着使用训练数据集中的最大值和最小值来缩放测试数据集.这样可以确保与训练数据上进行的变换保持一致,并可以评估模型是否可以很好地泛化.

Yes. Using the scaler that was fitted to the training dataset. In this case, it means using the max and min from the training dataset for scaling the test dataset. This ensures consistency with the transformation performed on the training data and makes it possible to evaluate if the model can generalize well.

您不必从头开始编写代码.使用 sklearn :

You do not have to code it from scratch. Using sklearn:

import numpy as np
from sklearn import preprocessing

X_train = np.array([[ 1., -1.,  2.], [ 2.,  0.,  0.],[ 0.,  1., -1.]])
X_test = np.array([[ 0, -1.,  1.5], [ 2.5,  0.,  1]])

scaler = preprocessing.MinMaxScaler()
scaler = scaler.fit(X_train)

X_train_minmax = scaler.transform(X_train)
X_test_minmax = scaler.transform(X_test)

注意:对于大多数应用,标准化是扩展preprocessing.StandardScaler()

Note: for most applications, standardization is the recommended approach for scaling preprocessing.StandardScaler()

这篇关于在改组和拆分后,我应该分别对培训和测试进行标准化吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆