为什么在tensorflow的tf.nn.dropout中缩放输入? [英] Why input is scaled in tf.nn.dropout in tensorflow?

查看:253
本文介绍了为什么在tensorflow的tf.nn.dropout中缩放输入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不明白为什么辍学在tensorflow中像这样工作. CS231n 的博客说,"dropout is implemented by only keeping a neuron active with some probability p (a hyperparameter), or setting it to zero otherwise."您也可以从图片中看到它(来自同一站点) )

I can't understand why dropout works like this in tensorflow. The blog of CS231n says that, "dropout is implemented by only keeping a neuron active with some probability p (a hyperparameter), or setting it to zero otherwise." Also you can see this from picture(Taken from the same site)

从tensorflow站点With probability keep_prob, outputs the input element scaled up by 1 / keep_prob, otherwise outputs 0.

From tensorflow site, With probability keep_prob, outputs the input element scaled up by 1 / keep_prob, otherwise outputs 0.

现在,为什么输入元素按1/keep_prob放大?为什么不按概率保持输入元素不变,而又不按1/keep_prob缩放比例呢?

Now, why the input element is scaled up by 1/keep_prob? Why not keep the input element as it is with probability and not scale it with 1/keep_prob?

推荐答案

此缩放比例使同一网络可用于训练(使用keep_prob < 1.0)和评估(使用keep_prob == 1.0).从删除论文:

This scaling enables the same network to be used for training (with keep_prob < 1.0) and evaluation (with keep_prob == 1.0). From the Dropout paper:

这个想法是在测试时使用单个神经网络而不会出现辍学现象.该网络的权重是训练后的权重的按比例缩小版本.如果在训练期间以概率 p 保留一个单位,则该单位的输出权重将在测试时乘以 p ,如图2所示.

The idea is to use a single neural net at test time without dropout. The weights of this network are scaled-down versions of the trained weights. If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time as shown in Figure 2.

TensorFlow实现不是在测试时添加op来按权重按keep_prob缩小权重,而是在训练时添加op来按权重按1. / keep_prob缩小权重.对性能的影响可以忽略不计,并且代码更简单(因为我们使用相同的图形并将keep_prob视为

Rather than adding ops to scale down the weights by keep_prob at test time, the TensorFlow implementation adds an op to scale up the weights by 1. / keep_prob at training time. The effect on performance is negligible, and the code is simpler (because we use the same graph and treat keep_prob as a tf.placeholder() that is fed a different value depending on whether we are training or evaluating the network).

这篇关于为什么在tensorflow的tf.nn.dropout中缩放输入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆