哪种标准化方法(最小-最大)或z缩放(零平均单位方差)最适合深度学习? [英] Which Normalization method, min-max or z-scaling (Zero mean unit variance), works best for deep learning?
问题描述
我有代表相对计数(0.0-1.0)的数据,如下面的示例所示.用公式
I have data that is representing relative counts (0.0-1.0) as presented in the example below. calculated with the formula
cell value(E.g.23)/sum of the colum(E.g. 1200) = 0.01916
示例数据
f1 f2 f3 f5 f6 f7 f8 class
0.266 0.133 0.200 0.133 0.066 0.133 0.066 1
0.250 0.130 0.080 0.160 0.002 0.300 0.111 0
0.000 0.830 0.180 0.016 0.002 0.059 0.080 1
0.300 0.430 0.078 0.100 0.082 0.150 0.170 0
在应用深度学习算法之前,我删除了显示高相关性的功能.
before applying Deep learning algorithm I remove features that shows a high correlation.
归一化时我很困惑,哪种方法在生成模型之前是正确的.
I am confused at the time of normalization, which method is correct before model generation.
- 直接使用数据,因为数据已经缩放(0.0-1.0).
- 执行最小-最大缩放( https://scikit-learn.org/stable/modules/generation/sklearn.preprocessing.MinMaxScaler.html )
- 执行( https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html )
因为,当我使用经典监督算法时,最小-最大和z缩放可提高性能.但是,在使用"TensorFlow-GPU"进行深度学习的情况下,我看不到两者之间的任何显着差异.
Because, when I use classical supervised algorithms min-max and z-scaling improve performance. But in the case of Deep learning using "TensorFlow-GPU" I am not able to see any significant difference between the two.
谢谢.
推荐答案
当您的数据大致呈正态分布时,z缩放是一个好主意,
z-scaling is a good idea when your data is approximately normally distributed, this can often be the case.
当您期望大致均匀的分布时,
min-max scaling is the right thing to do when you expect a largely uniform distribution.
简而言之,这取决于您的数据和神经网络.
In short, it depends on your data and your neuronal network.
但是两者都对异常值敏感,您可以尝试进行疯狂中位数缩放.
But both are sensitive to outliers, you could try median-mad scaling.
另请参阅: https://stats.stackexchange.com/Questions/7757/神经网络中的数据标准化和标准化
这篇关于哪种标准化方法(最小-最大)或z缩放(零平均单位方差)最适合深度学习?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!