我应该在将特征投入RNN之前对其进行规范化吗? [英] Should I normalize my features before throwing them into RNN?

查看:88
本文介绍了我应该在将特征投入RNN之前对其进行规范化吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在播放一些有关递归神经网络的演示.

I am playing some demos about recurrent neural network.

我注意到每列中数据的规模差异很大.因此,在考虑将数据批放入RNN之前,我正在考虑进行一些预处理工作.结束栏是我将来要预测的目标.

I noticed that the scale of my data in each column differs a lot. So I am considering to do some preprocess work before I throw data batches into my RNN. The close column is the target I want to predict in the future.

     open   high    low     volume  price_change  p_change     ma5    ma10  \
0  20.64  20.64  20.37  163623.62         -0.08     -0.39  20.772  20.721
1  20.92  20.92  20.60  218505.95         -0.30     -1.43  20.780  20.718
2  21.00  21.15  20.72  269101.41         -0.08     -0.38  20.812  20.755
3  20.70  21.57  20.70  645855.38          0.32      1.55  20.782  20.788
4  20.60  20.70  20.20  458860.16          0.10      0.48  20.694  20.806

     ma20      v_ma5     v_ma10     v_ma20  close
0  20.954  351189.30  388345.91  394078.37  20.56
1  20.990  373384.46  403747.59  411728.38  20.64
2  21.022  392464.55  405000.55  426124.42  20.94
3  21.054  445386.85  403945.59  473166.37  21.02
4  21.038  486615.13  378825.52  461835.35  20.70

我的问题是,在我的情况下,是否需要使用sklearn中的StandardScaler预处理数据?为什么呢?

My question is, is preprocessing the data with, say StandardScaler in sklearn necessary in my case? And why?

(欢迎您编辑我的问题)

(You are welcome to edit my question)

推荐答案

标准化训练数据将非常有帮助.将具有不同比例的不同特征输入到模型中将导致网络对特征的加权不均等.这可能会导致表示中的某些功能比其他功能具有错误的优先级.

It will be beneficial to normalize your training data. Having different features with widely different scales fed to your model will cause the network to weight the features not equally. This can cause a falsely prioritisation of some features over the others in the representation.

尽管有关数据预处理的整个讨论在何时确切必要以及如何针对每个给定模型和应用程序域正确规范化数据方面存在争议,但在机器学习中,人们普遍认为运行均值减法以及常规的 Normalization 预处理步骤会很有帮助.

Despite that the whole discussion on data preprocessing is controversial either on when exactly it is necessary and how to correctly normalize the data for each given model and application domain there is a general consensus in Machine Learning that running a Mean subtraction as well as a general Normalization preprocessing step is helpful.

均值减法的情况下,将从数据中减去每个单个特征的均值,这可以解释为从几何学角度将数据围绕原点居中.每个维度都是如此.

In the case of Mean subtraction, the mean of every individual feature is being subtracted from the data which can be interpreted as centering the data around the origin from a geometric point of view. This is true for every dimensionality.

归一化数据将数据维数归一化到大致相同的比例.请注意,如上所述,在执行此步骤之后,不同功能将失去彼此之间的优先级.如果您有充分的理由认为要素的不同比例具有重要信息,网络可能需要这些信息来真正理解数据集中的基础模式,那么规范化将是有害的.一种标准方法是将输入缩放为具有平均值为0 方差为1 .

Normalizing the data after the Mean subtraction step results in a normalization of the data dimensionality to approximately the same scale. Note that the different features will loose any prioritization over each other after this step as mentioned above. If you have good reasons to think that the different scales in your features bear important information that the network may need to truly understand the underlying patterns in your dataset, then a normalization will be harmful. A standard approach would be to scale the inputs to have mean of 0 and a variance of 1.

在某些特定情况下,例如对数据执行 PCA Whitening ,进一步的预处理操作可能会有所帮助.查看 CS231n(设置数据和模型)的精彩笔记有关这些主题的参考,以及对以上主题的更详细说明.

Further preprocessing operations may be helpful in specific cases such as performing PCA or Whitening on your data. Look into the awesome notes of CS231n (Setting up the data and the model) for further reference on these topics as well as for a more detailed explenation of the topics above.

这篇关于我应该在将特征投入RNN之前对其进行规范化吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆