为什么在使用Dropout时要缩放输出? [英] Why do we want to scale outputs when using dropout?

查看:304
本文介绍了为什么在使用Dropout时要缩放输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

辍学论文:

的想法是在测试时使用单个神经网络而不会出现辍学现象.该网络的权重是受过培训的按比例缩小的版本重量.如果在训练期间以概率p保留一个单元,则在测试时间,该单位的输出权重乘以p为如图2所示.这确保了对于任何隐藏的单元而言,预期的输出(在训练时用于掉落单位的分布下)为与测试时的实际输出相同."

"The idea is to use a single neural net at test time without dropout. The weights of this network are scaled-down versions of the trained weights. If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time as shown in Figure 2. This ensures that for any hidden unit the expected output (under the distribution used to drop units at training time) is the same as the actual output at test time."

我们为什么要保留预期的输出?如果我们使用ReLU激活,则权重或激活的线性缩放会导致网络输出的线性缩放,并且不会对分类准确性产生任何影响.

Why do we want to preserve the expected output? If we use ReLU activations, linear scaling of weights or activations results in linear scaling of network outputs and does not have any effect on the classification accuracy.

我想念什么?

推荐答案

准确地说,我们不希望保留预期输出",而是保留输出的期望值,也就是说,我们要弥补差异在训练中(当我们不传递某些节点的值时),并通过保留输出的均值(预期值)来测试阶段.

To be precise, we want to preserve not the "expected output" but the expected value of the output, that is, we want to make up for the difference in training (when we don't pass values of some nodes) and testing phases by preserving mean (expected) values of outputs.

在ReLU激活的情况下,这种缩放比例确实会导致输出线性缩放(当它们为正时),但是您为什么认为它不影响分类模型的最终准确性?至少到最后,我们通常应用Sigmoid的softmax,它是非线性的并且取决于此缩放比例.

In case of ReLU activations this scaling indeed leads to linear scaling of outputs (when they are positive) but why do you think it doesn't affect final accuracy of a classification model? At least in the end, we usually apply either softmax of sigmoid which are non-linear and depend on this scaling.

这篇关于为什么在使用Dropout时要缩放输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆