使用MinMaxScaler将测试数据缩放到0和1 [英] Scaling test data to 0 and 1 using MinMaxScaler

查看:851
本文介绍了使用MinMaxScaler将测试数据缩放到0和1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用sklearn的MinMaxScaler,我按如下所示缩放数据.

Using the MinMaxScaler from sklearn, I scale my data as below.

min_max_scaler = preprocessing.MinMaxScaler()
X_train_scaled = min_max_scaler.fit_transform(features_train)
X_test_scaled = min_max_scaler.transform(features_test)

但是,在打印X_test_scaled.min()时,我有一些负值(值不介于0和1之间).这是因为我的测试数据中的最小值低于适合最小最大缩放器的火车数据.

However, when printing X_test_scaled.min(), I have some negative values (the values do not fall between 0 and 1). This is due to the fact that the lowest value in my test data was lower than the train data, of which the min max scaler was fit.

在0和1值之间进行精确标准化的数据对SVM分类器没有多大影响?另外,将训练和测试数据连接到一个矩阵中,执行最小-最大缩放以确保值介于0和1之间,然后再次将它们分开,是不明智的做法?

How much effect does not having exactly normalized data between 0 and 1 values have on the SVM classifier? Also, is it bad practice to concatenate the train and test data into a single matrix, perform min-max scaling to ensure values are between 0 and 1, then seperate them again?

推荐答案

对于这种缩放,实际上它并不重要,但是通常您不应该使用测试数据来估计预处理的任何参数.这会使您的结果严重偏向于更复杂的预处理步骤.

For this scaling it probably doesn't matter much in practice, but in general you should not use your test data to estimate any parameters of the preprocessing. This can severely bias you results for more complex preprocessing steps.

实际上没有理由要在此处连接数据,SVM会处理它. 如果您使用的模型需要正值,而您的测试数据却没有正值,那么您可以考虑使用MinMaxScaler以外的其他策略.

There is really no reason why you would want to concatenate the data here, the SVM will deal with it. If you would be using a model that needs positive values and your test data is not made positive, you might consider another strategy than the MinMaxScaler.

这篇关于使用MinMaxScaler将测试数据缩放到0和1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆