辍学的Keras实现是否正确? [英] Is the Keras implementation of dropout correct?

查看：25 发布时间：2021/12/27 17:22:02 machine-learning neural-network deep-learning keras regularized

本文介绍了辍学的Keras实现是否正确?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Keras dropout 的实现参考这篇论文.

The Keras implementation of dropout references this paper.

以下摘录自那篇论文:

这个想法是在测试时使用一个单一的神经网络而不会出现 dropout.该网络的权重是经过训练的缩小版本重量.如果在训练期间以概率 p 保留一个单元，则该单元的输出权重在测试时乘以 p 为如图 2 所示.

The idea is to use a single neural net at test time without dropout. The weights of this network are scaled-down versions of the trained weights. If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time as shown in Figure 2.

Keras 文档提到 dropout 仅在训练时使用，以下是 Dropout 实现中的行

The Keras documentation mentions that dropout is only used at train time, and the following line from the Dropout implementation

x = K.in_train_phase(K.dropout(x, level=self.p), x)

似乎表明层的输出确实在测试期间简单地传递了.

seems to indicate that indeed outputs from layers are simply passed along during test time.

此外，我找不到在训练完成后按论文建议缩小权重的代码.我的理解是，这个缩放步骤对于 dropout 工作是必不可少的，因为它相当于在子网络"的集合中获取中间层的预期输出.没有它，计算就不能再被认为是从这个子网络"集合中采样的.

Further, I cannot find code which scales down the weights after training is complete as the paper suggests. My understanding is this scaling step is fundamentally necessary to make dropout work, since it is equivalent to taking the expected output of intermediate layers in an ensemble of "subnetworks." Without it, the computation can no longer be considered sampling from this ensemble of "subnetworks."

那么，我的问题是，如果有的话，这种辍学在 Keras 中实现的缩放效果在哪里?

My question, then, is where is this scaling effect of dropout implemented in Keras, if at all?

更新 1: 好的，所以 Keras 使用反向 dropout，尽管它在 Keras 文档和代码中称为 dropout.链接 http://cs231n.github.io/neural-networks-2/#reg 似乎并不表明两者是等价的.https://stats.stackexchange.com/questions/205932/dropout-scaling-the-activation-versus-反转辍学.我可以看到他们做类似的事情，但我还没有看到有人说他们完全一样.我认为他们不是.

Update 1: Ok, so Keras uses inverted dropout, though it is called dropout in the Keras documentation and code. The link http://cs231n.github.io/neural-networks-2/#reg doesn't seem to indicate that the two are equivalent. Nor does the answer at https://stats.stackexchange.com/questions/205932/dropout-scaling-the-activation-versus-inverting-the-dropout. I can see that they do similar things, but I have yet to see anyone say they are exactly the same. I think they are not.

那么一个新问题:dropout 和inverted dropout 是等价的吗?需要明确的是，我正在寻找数学上的理由来说明它们是或不是.

So a new question: Are dropout and inverted dropout equivalent? To be clear, I'm looking for mathematical justification for saying they are or aren't.

辍学的Keras实现是否正确? [英] Is the Keras implementation of dropout correct?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

辍学的Keras实现是否正确? [英] Is the Keras implementation of dropout correct?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭