多层感知器(MLP)体系结构:选择隐藏层数和隐藏层大小的标准? [英] multi-layer perceptron (MLP) architecture: criteria for choosing number of hidden layers and size of the hidden layer?

查看:2285
本文介绍了多层感知器(MLP)体系结构:选择隐藏层数和隐藏层大小的标准?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我们有10个特征向量,则在输入层中可以有10个神经节点;如果我们有5个输出类,则在输出层中可以有5个节点.但是在MLP中选择隐藏层数的标准是什么? 1个隐藏层中有多少个神经节点?

解决方案

有多少个隐藏层?

具有个隐藏层的模型将解析可线性分离的数据.因此,除非您已经知道数据不是线性可分离的,否则验证它不会有任何伤害-为什么要使用比任务要求更复杂的模型?如果它是线性可分离的,那么可以使用一种更简单的方法,但是Perceptron也可以.

假设您的数据确实需要通过非线性技术进行分离,那么 总是从一个隐藏层开始 .几乎可以肯定,这就是您所需要的.如果使用MLP可分离数据,则该MLP可能仅需要一个隐藏层.为此有理论上的证明,但我的理由纯粹是凭经验:使用单隐藏层MLP解决了许多困难的分类/回归问题,但我不记得遇到过用于成功建模数据的任何多隐藏层MLP, -是否存在于ML公告板上,ML教科书,学术论文等上,它们确实存在.但是,凭经验证明使用它们的理由很少.


隐藏层中有多少个节点?

来自MLP学术文献.以我自己的经验等为基础,我收集并经常依靠一些经验法则( RoT ),并且我发现这些经验法则也是可靠的指南(例如,该指南是准确的,甚至如果不是,通常很清楚下一步该怎么做):

RoT 基于改进的收敛性:

开始构建模型时,在 more 节点一侧出现错误 在隐藏层中.

为什么?首先,隐藏层中的几个额外节点可能不会造成任何损害-您的MLP仍将收敛.另一方面,隐藏层中的节点太少会阻止收敛.以这种方式考虑,其他节点会提供一些多余的容量-额外的权重,以便在迭代(训练或模型构建)期间将信号存储/释放到网络.其次,如果您从隐藏层中的其他节点开始,那么稍后可以很容易地修剪它们(在迭代过程中).这很常见,并且有诊断技术可以为您提供帮助(例如,欣顿图(Hinton Diagram),它只是重量矩阵的可视化表示,重量值的热图").

RoT 基于输入层的大小和输出层的大小:

经验法则是此[hidden]层的大小应在某处 在输入图层大小...和输出图层大小之间....

要计算隐藏节点的数量,我们使用以下通用规则: (输入+输出数量)x 2/3

RoT 基于主要成分:

通常,我们指定与维度一样多的隐藏节点[主要 组件]需要捕获输入数据的70-90%的方差 设置.

NN FAQ 的作者称这些规则为废话" (从字面上看)是因为它们:忽略训练实例的数量,目标中的噪声(响应变量的值)以及特征空间的复杂性.

在他看来(我总是觉得他知道他在说什么),根据您的MLP是否包含某种形式的正则化或提前停止来选择隐藏层中的神经元数量 em>.

用于优化隐藏层中神经元数量的唯一有效技术:

在模型构建过程中,进行强迫性测试;测试将揭示不正确"网络体系结构的特征.例如,如果您开始使用的MLP具有由少量节点组成的隐藏层(根据测试结果,您将根据需要逐渐增加该隐藏层),则由于偏差和拟合不足,您的训练和泛化误差都会很高. /p>

然后一次增加一个隐藏层中的节点数,直到泛化误差开始增加为止,这是由于过度拟合和高方差.


在实践中,我这样做:

输入层:我的数据变量的大小(模型中要素的数量)+ 1(用于偏差节点),当然不包括响应变量

输出层:由我的模型确定的唯一结果:回归(一个节点)与分类(假设有softmax的节点数等于类数)

隐藏层:开始一个隐藏层,其节点数等于输入层的大小. 理想"大小更可能更小(即,输入层中的节点数与输出层中的节点数之间的一定数量的节点),而不是更大-再次,这只是一个经验观察,并且庞大这种观察是我自己的经验.如果项目证明需要额外的时间,那么我将从包含少量节点的单个隐藏层开始,然后(如上所述),我一次将一个节点添加到隐藏层",同时计算概化误差,训练误差,偏差和方差.当泛化误差下降并且在它再次开始增大之前,我选择此时的节点数.参见下图.

If we have 10 eigenvectors then we can have 10 neural nodes in input layer.If we have 5 output classes then we can have 5 nodes in output layer.But what is the criteria for choosing number of hidden layer in a MLP and how many neural nodes in 1 hidden layer?

解决方案

how many hidden layers?

a model with zero hidden layers will resolve linearly separable data. So unless you already know your data isn't linearly separable, it doesn't hurt to verify this--why use a more complex model than the task requires? If it is linearly separable then a simpler technique will work, but a Perceptron will do the job as well.

Assuming your data does require separation by a non-linear technique, then always start with one hidden layer. Almost certainly that's all you will need. If your data is separable using a MLP, then that MLP probably only needs a single hidden layer. There is theoretical justification for this, but my reason is purely empirical: Many difficult classification/regression problems are solved using single-hidden-layer MLPs, yet I don't recall encountering any multiple-hidden-layer MLPs used to successfully model data--whether on ML bulletin boards, ML Textbooks, academic papers, etc. They exist, certainly, but the circumstances that justify their use is empirically quite rare.


How many nodes in the hidden layer?

From the MLP academic literature. my own experience, etc., I have gathered and often rely upon several rules of thumb (RoT), and which I have also found to be reliable guides (ie., the guidance was accurate, and even when it wasn't, it was usually clear what to do next):

RoT based on improving convergence:

When you begin the model building, err on the side of more nodes in the hidden layer.

Why? First, a few extra nodes in the hidden layer isn't likely do any any harm--your MLP will still converge. On the other hand, too few nodes in the hidden layer can prevent convergence. Think of it this way, additional nodes provides some excess capacity--additional weights to store/release signal to the network during iteration (training, or model building). Second, if you begin with additional nodes in your hidden layer, then it's easy to prune them later (during iteration progress). This is common and there are diagnostic techniques to assist you (e.g., Hinton Diagram, which is just a visual depiction of the weight matrices, a 'heat map' of the weight values,).

RoTs based on size of input layer and size of output layer:

A rule of thumb is for the size of this [hidden] layer to be somewhere between the input layer size ... and the output layer size....

To calculate the number of hidden nodes we use a general rule of: (Number of inputs + outputs) x 2/3

RoT based on principal components:

Typically, we specify as many hidden nodes as dimensions [principal components] needed to capture 70-90% of the variance of the input data set.

And yet the NN FAQ author calls these Rules "nonsense" (literally) because they: ignore the number of training instances, the noise in the targets (values of the response variables), and the complexity of the feature space.

In his view (and it always seemed to me that he knows what he's talking about), choose the number of neurons in the hidden layer based on whether your MLP includes some form of regularization, or early stopping.

The only valid technique for optimizing the number of neurons in the Hidden Layer:

During your model building, test obsessively; testing will reveal the signatures of "incorrect" network architecture. For instance, if you begin with an MLP having a hidden layer comprised of a small number of nodes (which you will gradually increase as needed, based on test results) your training and generalization error will both be high caused by bias and underfitting.

Then increase the number of nodes in the hidden layer, one at a time, until the generalization error begins to increase, this time due to overfitting and high variance.


In practice, I do it this way:

input layer: the size of my data vactor (the number of features in my model) + 1 for the bias node and not including the response variable, of course

output layer: soley determined by my model: regression (one node) versus classification (number of nodes equivalent to the number of classes, assuming softmax)

hidden layer: to start, one hidden layer with a number of nodes equal to the size of the input layer. The "ideal" size is more likely to be smaller (i.e, some number of nodes between the number in the input layer and the number in the output layer) rather than larger--again, this is just an empirical observation, and the bulk of this observation is my own experience. If the project justified the additional time required, then I start with a single hidden layer comprised of a small number of nodes, then (as i explained just above) I add nodes to the Hidden Layer, one at a time, while calculating the generalization error, training error, bias, and variance. When generalization error has dipped and just before it begins to increase again, the number of nodes at that point is my choice. See figure below.

这篇关于多层感知器(MLP)体系结构:选择隐藏层数和隐藏层大小的标准?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆