多层感知器 (MLP) 架构:选择隐藏层数和隐藏层大小的标准? [英] multi-layer perceptron (MLP) architecture: criteria for choosing number of hidden layers and size of the hidden layer?

查看:575
本文介绍了多层感知器 (MLP) 架构:选择隐藏层数和隐藏层大小的标准?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我们有 10 个特征向量,那么我们可以在输入层有 10 个神经节点.如果我们有 5 个输出类,那么我们在输出层可以有 5 个节点.但是在 MLP 中选择隐藏层数的标准是什么?1 个隐藏层中有多少个神经节点?

If we have 10 eigenvectors then we can have 10 neural nodes in input layer.If we have 5 output classes then we can have 5 nodes in output layer.But what is the criteria for choosing number of hidden layer in a MLP and how many neural nodes in 1 hidden layer?

推荐答案

有多少隐藏层?

how many hidden layers?

具有隐藏层的模型将解析线性可分数据.因此,除非您已经知道您的数据不是线性可分的,否则验证这一点并没有什么坏处——为什么要使用比任务所需的更复杂的模型?如果它是线性可分的,那么更简单的技术会起作用,但感知器也能完成这项工作.

a model with zero hidden layers will resolve linearly separable data. So unless you already know your data isn't linearly separable, it doesn't hurt to verify this--why use a more complex model than the task requires? If it is linearly separable then a simpler technique will work, but a Perceptron will do the job as well.

假设您的数据确实需要通过非线性技术分离,那么总是从一个隐藏层开始.几乎可以肯定,这就是您所需要的.如果您的数据可以使用 MLP 进行分离,那么该 MLP 可能只需要一个隐藏层.对此有理论依据,但我的理由纯粹是经验性的:使用单隐藏层 MLP 解决了许多困难的分类/回归问题,但我不记得遇到任何用于成功建模数据的多隐藏层 MLP -- 无论是在 ML 公告板、ML 教科书、学术论文等上.它们确实存在,但在经验上证明它们使用合理的情况非常罕见.

Assuming your data does require separation by a non-linear technique, then always start with one hidden layer. Almost certainly that's all you will need. If your data is separable using a MLP, then that MLP probably only needs a single hidden layer. There is theoretical justification for this, but my reason is purely empirical: Many difficult classification/regression problems are solved using single-hidden-layer MLPs, yet I don't recall encountering any multiple-hidden-layer MLPs used to successfully model data--whether on ML bulletin boards, ML Textbooks, academic papers, etc. They exist, certainly, but the circumstances that justify their use is empirically quite rare.


隐藏层有多少个节点?

来自 MLP 学术文献.我自己的经验等,我已经收集并经常依赖几个经验法则 (RoT),而且我还发现它们是可靠的指南(即,指南是准确的,甚至如果不是,通常很清楚接下来要做什么):

From the MLP academic literature. my own experience, etc., I have gathered and often rely upon several rules of thumb (RoT), and which I have also found to be reliable guides (ie., the guidance was accurate, and even when it wasn't, it was usually clear what to do next):

RoT 基于改进收敛:

当您开始模型构建时,在更多节点方面犯错在隐藏层.

When you begin the model building, err on the side of more nodes in the hidden layer.

为什么?首先,隐藏层中的一些额外节点不太可能造成任何伤害——您的 MLP 仍会收敛.另一方面,隐藏层中的节点太少会阻碍收敛.这样想,额外的节点提供了一些多余的容量——额外的权重,用于在迭代(训练或模型构建)期间向网络存储/释放信号.其次,如果您从隐藏层中的其他节点开始,那么稍后(在迭代过程中)很容易修剪它们.这很常见,并且有一些诊断技术可以帮助您(例如,Hinton 图,它只是权重矩阵的可视化描述,权重值的热图").

Why? First, a few extra nodes in the hidden layer isn't likely do any any harm--your MLP will still converge. On the other hand, too few nodes in the hidden layer can prevent convergence. Think of it this way, additional nodes provides some excess capacity--additional weights to store/release signal to the network during iteration (training, or model building). Second, if you begin with additional nodes in your hidden layer, then it's easy to prune them later (during iteration progress). This is common and there are diagnostic techniques to assist you (e.g., Hinton Diagram, which is just a visual depiction of the weight matrices, a 'heat map' of the weight values,).

RoTs 基于输入层的大小和输出层的大小:

RoTs based on size of input layer and size of output layer:

经验法则是将此 [隐藏] 层的大小设置在某处输入层大小...和输出层大小之间....

为了计算隐藏节点的数量,我们使用以下一般规则:(输入数 + 输出数)x 2/3

RoT 基于主成分:

通常,我们指定与维度[主要组件] 需要捕获输入数据的 70-90% 的方差设置.

然而 NN FAQ 作者称这些规则为无稽之谈"(字面意思)因为它们:忽略训练实例的数量、目标中的噪声(响应变量的值)以及特征空间的复杂性.

And yet the NN FAQ author calls these Rules "nonsense" (literally) because they: ignore the number of training instances, the noise in the targets (values of the response variables), and the complexity of the feature space.

在他看来(在我看来他总是知道他在说什么),根据您的 MLP 是否包含某种形式的正则化或提前停止来选择隐藏层中的神经元数量.

In his view (and it always seemed to me that he knows what he's talking about), choose the number of neurons in the hidden layer based on whether your MLP includes some form of regularization, or early stopping.

优化隐藏层神经元数量的唯一有效技术:

在你的模型构建过程中,痴迷地测试;测试将揭示不正确"网络架构的特征.例如,如果您从具有由少量节点组成的隐藏层的 MLP 开始(根据测试结果,您将根据需要逐渐增加),由于偏差和欠拟合,您的训练和泛化误差都会很高.

During your model building, test obsessively; testing will reveal the signatures of "incorrect" network architecture. For instance, if you begin with an MLP having a hidden layer comprised of a small number of nodes (which you will gradually increase as needed, based on test results) your training and generalization error will both be high caused by bias and underfitting.

然后增加隐藏层的节点数,一次一个,直到泛化误差开始增加,这次是由于过拟合和高方差.

Then increase the number of nodes in the hidden layer, one at a time, until the generalization error begins to increase, this time due to overfitting and high variance.

在实践中,我是这样做的:

In practice, I do it this way:

输入层:我的数据向量的大小(我模型中的特征数量)+ 1 用于偏置节点,当然不包括响应变量

input layer: the size of my data vactor (the number of features in my model) + 1 for the bias node and not including the response variable, of course

输出层:由我的模型决定的唯一:回归(一个节点)与分类(节点数等于类数,假设为 softmax)

output layer: soley determined by my model: regression (one node) versus classification (number of nodes equivalent to the number of classes, assuming softmax)

隐藏层:开始一个隐藏层,节点数等于输入层的大小.理想"的大小更有可能更小(即输入层中的数量和输出层中的数量之间的一些节点数)而不是更大——同样,这只是一个经验观察,并且大部分这个观察是我自己的经验.如果项目证明所需的额外时间是合理的,那么我从由少量节点组成的单个隐藏层开始,然后(正如我在上面解释的那样)我将节点添加到隐藏层,一次一个,同时计算泛化误差、训练误差、偏差和方差.当泛化误差下降并且就在它再次开始增加之前,此时的节点数量是我的选择.见下图.

hidden layer: to start, one hidden layer with a number of nodes equal to the size of the input layer. The "ideal" size is more likely to be smaller (i.e, some number of nodes between the number in the input layer and the number in the output layer) rather than larger--again, this is just an empirical observation, and the bulk of this observation is my own experience. If the project justified the additional time required, then I start with a single hidden layer comprised of a small number of nodes, then (as i explained just above) I add nodes to the Hidden Layer, one at a time, while calculating the generalization error, training error, bias, and variance. When generalization error has dipped and just before it begins to increase again, the number of nodes at that point is my choice. See figure below.

这篇关于多层感知器 (MLP) 架构:选择隐藏层数和隐藏层大小的标准?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆