为什么在隐藏层中不使用softmax [英] Why is softmax not used in hidden layers

查看:390
本文介绍了为什么在隐藏层中不使用softmax的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已阅读

I have read the answer given here. My exact question pertains to the accepted answer:

  1. 变量独立性:进行了大量的正则化和工作量来保持变量独立,不相关且相当稀疏.如果将softmax层用作隐藏层,则将使所有节点(隐藏变量)保持线性依赖关系,这可能会导致许多问题和较差的泛化性.

放弃隐藏层中的变量独立性会产生哪些并发症?请提供至少一个示例.我知道隐藏变量独立性在编纂反向传播方面有很大帮助,但是反向传播也可以在softmax中进行编码(请验证我在此声明中是否正确.我似乎已经按照我的观点正确理解了.因此该声明)

What are the complications that forgoing the variable independence in hidden layers arises? Please provide at least one example. I know hidden variable independence helps a lot in codifying the backpropogation but backpropogation can be codified for softmax as well (Please verify if or not i am correct in this claim. I seem to have gotten the equations right according to me. hence the claim).

  1. 培训问题:尝试想象一下,要使您的网络更好地工作,您必须降低隐藏层的激活量.然后-自动地使它们剩下的部分具有更高水平的平均激活能力,这实际上可能会增加错误并损害您的训练阶段.

即使在S型隐匿神经元中,您也无法实现那种灵活性,您可以在其中微调特定给定神经元的激活,而这正是梯度下降的工作.那么我们为什么还要担心这个问题.如果可以实施,则可以通过梯度下降来保护后置支撑.即使您可以做,也无法做到微调权重以使激活正确进行. (如果我的理解不对,请纠正我)

I don't understand how you achieve that kind of flexibility even in sigmoid hidden neuron where you can fine tune the activation of a particular given neuron which is precisely what the gradient descent's job is. So why are we even worried about this issue. If you can implement the backprop rest will be taken care of by gradient descent. Fine tuning the weights so as to make the activations proper is not something you, even if you could do, which you cant, would want to do. (Kindly correct me if my understanding is wrong here)

  1. 数学问题:通过对模型的激活进行约束,可以在没有任何逻辑解释的情况下降低模型的表达能力.在我看来,使所有激活都相同的努力是不值得的.

请在这里解释一下

  1. 批量归一化:我理解这一点,这里没有问题

推荐答案

1/2 .我不认为您对作者要说的内容有任何了解.想象一个包含3个节点的图层.这些节点中的2个相对于输出错误的错误责任为0;因此,应该调整óne节点.因此,如果您想改善节点0的输出,那么您会立即影响该层中的节点1和2-可能使输出更加错误.

1/2. I don't think you have a clue of what the author is trying to say. Imagine a layer with 3 nodes. 2 of these nodes have an error responsibility of 0 with respect to the output error; so there is óne node that should be adjusted. So if you want to improve the output of node 0, then you immediately affect nodes 1 and 2 in that layer - possibly making the output even more wrong.

精细调整权重以使激活正确进行并不是您想要的,即使您可以做到,也无法做到. (如果我的理解不对,请纠正我)

Fine tuning the weights so as to make the activations proper is not something you, even if you could do, which you cant, would want to do. (Kindly correct me if my understanding is wrong here)

这是反向传播的定义.这正是您想要的.神经网络依赖于激活(非线性)来映射功能.

That is the definition of backpropagation. That is exactly what you want. Neural networks rely on activations (which are non-linear) to map a function.

3.您基本上对每个神经元说:嘿,您的输出不能高于x,因为此层中的其他一些神经元已经具有值y".因为softmax层中的所有神经元都应具有1的总激活,所以这意味着神经元不能高于特定值.对于小层-小问题,但对于大层-大问题.想象一个包含100个神经元的层.现在,假设它们的总输出应为1.这些神经元的平均值将为0.01->,这意味着您正在依赖网络连接(因为激活平均会保持在非常低的水平)-其他激活函数的输出(或接受输入)的范围为(0:1/-1:1).

3. Your basically saying to every neuron 'hey, your output cannot be higher than x, because some other neuron in this layer already has value y'. Because all neurons in a softmax layer should have a total activation of 1, it means that neurons cannot be higher than a specific value. For small layers - small problem, but for big layers - big problem. Imagine a layer with 100 neurons. Now imagine their total output should be 1. The average value of those neurons will be 0.01 -> that means you are making networks connection relying (because activations will stay very low, averagely) - as other activation functions output (or take on input) of range (0:1 / -1:1).

这篇关于为什么在隐藏层中不使用softmax的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆