tensorflow BasicLSTMCell 中的 num_units 是什么? [英] What is num_units in tensorflow BasicLSTMCell?

查看:29
本文介绍了tensorflow BasicLSTMCell 中的 num_units 是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 MNIST LSTM 示例中,我不明白隐藏层"是什么意思.当您表示随时间展开的 RNN 时,它是形成的虚层吗?

In MNIST LSTM examples, I don't understand what "hidden layer" means. Is it the imaginary-layer formed when you represent an unrolled RNN over time?

为什么在大多数情况下 num_units = 128 ?

Why is the num_units = 128 in most cases ?

推荐答案

隐藏单元的数量是神经网络学习能力的直接表示——它反映了学习参数的数量.值 128 可能是任意或凭经验选择的.您可以通过实验更改该值并重新运行程序以查看它如何影响训练准确度(您可以使用 很多 更少的隐藏单元获得超过 90% 的测试准确度).使用更多单元可以更好地记住完整的训练集(尽管它需要更长的时间,并且您会冒过拟合的风险).

The number of hidden units is a direct representation of the learning capacity of a neural network -- it reflects the number of learned parameters. The value 128 was likely selected arbitrarily or empirically. You can change that value experimentally and rerun the program to see how it affects the training accuracy (you can get better than 90% test accuracy with a lot fewer hidden units). Using more units makes it more likely to perfectly memorize the complete training set (although it will take longer, and you run the risk of over-fitting).

要理解的关键,这在著名的 Colah 的博客中有些微妙post(找到每一行都携带一个完整的向量"),就是X是一个数组的数据(现在通常称为 tensor)——它是不是 标量 值.例如,在显示 tanh 函数的地方,它意味着该函数在整个数组中广播(隐式 for循环)——而不是简单地每个时间步执行一次.

The key thing to understand, which is somewhat subtle in the famous Colah's blog post (find "each line carries an entire vector"), is that X is an array of data (nowadays often called a tensor) -- it is not meant to be a scalar value. Where, for example, the tanh function is shown, it is meant to imply that the function is broadcast across the entire array (an implicit for loop) -- and not simply performed once per time-step.

因此,隐藏单元代表网络中的有形存储,这主要体现在 weights 数组的大小上.而且因为 LSTM 实际上确实有一些它自己的内部存储与学习的模型参数分开,它必须知道有多少个单元——最终需要与权重的大小保持一致.在最简单的情况下,RNN 没有内部存储——因此它甚至不需要提前知道它被应用于多少隐藏单元".

As such, the hidden units represent tangible storage within the network, which is manifest primarily in the size of the weights array. And because an LSTM actually does have a bit of it's own internal storage separate from the learned model parameters, it has to know how many units there are -- which ultimately needs to agree with the size of the weights. In the simplest case, an RNN has no internal storage -- so it doesn't even need to know in advance how many "hidden units" it is being applied to.

  • A good answer to a similar question here.
  • You can look at the source for BasicLSTMCell in TensorFlow to see exactly how this is used.

旁注:这种符号在统计和机器学习中非常常见,以及使用通用公式处理大量数据的其他领域(3D 图形是另一个示例).对于希望看到明确写出的 for 循环的人来说,需要一点时间来适应.

Side note: This notation is very common in statistics and machine-learning, and other fields that process large batches of data with a common formula (3D graphics is another example). It takes a bit of getting used to for people who expect to see their for loops written out explicitly.

这篇关于tensorflow BasicLSTMCell 中的 num_units 是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆