Keras辍学实施正确吗? [英] Is the Keras implementation of dropout correct?

查看：133 发布时间：2020/4/25 9:53:58 machine-learning neural-network deep-learning keras regularized

本文介绍了Keras辍学实施正确吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

辍学参考的Keras实现本文

以下摘录来自该论文:

这个想法是在测试时使用单个神经网络而不会出现辍学现象. 该网络的权重是受过培训的按比例缩小的版本重量.如果在训练期间以概率p保留一个单元，则在测试时间，该单位的输出权重乘以p为如图2所示.

The idea is to use a single neural net at test time without dropout. The weights of this network are scaled-down versions of the trained weights. If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time as shown in Figure 2.

Keras文档中提到辍学仅在火车上使用，以及Dropout实现的以下代码行

The Keras documentation mentions that dropout is only used at train time, and the following line from the Dropout implementation

x = K.in_train_phase(K.dropout(x, level=self.p), x)

似乎表明确实在测试期间简单地传递了各层的输出.

seems to indicate that indeed outputs from layers are simply passed along during test time.

此外，正如论文所建议的那样，在培训完成后，我找不到能按比例缩小代码的代码.我的理解是，扩展步骤从根本上是进行辍学工作所必需的，因为这等效于在子网"集合中获取中间层的预期输出.没有它，就不能再考虑从这种子网"集合中进行计算.

Further, I cannot find code which scales down the weights after training is complete as the paper suggests. My understanding is this scaling step is fundamentally necessary to make dropout work, since it is equivalent to taking the expected output of intermediate layers in an ensemble of "subnetworks." Without it, the computation can no longer be considered sampling from this ensemble of "subnetworks."

那么，我的问题是，如果真的在Keras中实现了辍学的这种缩放效应呢?

My question, then, is where is this scaling effect of dropout implemented in Keras, if at all?

更新1:好的，所以Keras使用倒置的dropout，尽管在Keras文档和代码中将其称为dropout.链接 https://stats.stackexchange.com/questions/205932/dropout-scaling-the-activation-versus -inverting-the-dropout .我可以看到它们做类似的事情，但是我还没有看到有人说它们是完全一样的.我认为不是.

Update 1: Ok, so Keras uses inverted dropout, though it is called dropout in the Keras documentation and code. The link http://cs231n.github.io/neural-networks-2/#reg doesn't seem to indicate that the two are equivalent. Nor does the answer at https://stats.stackexchange.com/questions/205932/dropout-scaling-the-activation-versus-inverting-the-dropout. I can see that they do similar things, but I have yet to see anyone say they are exactly the same. I think they are not.

一个新的问题:辍学和反向辍学是否相等?明确地说，我正在寻找数学上的理由来说明它们是或不是.

So a new question: Are dropout and inverted dropout equivalent? To be clear, I'm looking for mathematical justification for saying they are or aren't.

Keras辍学实施正确吗? [英] Is the Keras implementation of dropout correct?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

Keras辍学实施正确吗? [英] Is the Keras implementation of dropout correct?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭