批量标准化和辍学的顺序? [英] Ordering of batch normalization and dropout?

查看:22
本文介绍了批量标准化和辍学的顺序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最初的问题是专门针对 TensorFlow 实现的.但是,答案是针对一般实现的.这个通用答案也是 TensorFlow 的正确答案.

在 TensorFlow 中使用批量归一化和 dropout 时(特别是使用 contrib.layers)时,我需要担心排序吗?

When using batch normalization and dropout in TensorFlow (specifically using the contrib.layers) do I need to be worried about the ordering?

如果我使用 dropout 后立即进行批量标准化,似乎可能会出现问题.例如,如果批量归一化的转变训练到训练输出的较大尺度数,但随后将相同的转变应用于较小的(由于具有更多输出的补偿)尺度数而不会在测试期间丢失,那么班次可能已关闭.TensorFlow 批量归一化层是否会自动对此进行补偿?或者这不是因为我失踪的某种原因而发生的?

It seems possible that if I use dropout followed immediately by batch normalization there might be trouble. For example, if the shift in the batch normalization trains to the larger scale numbers of the training outputs, but then that same shift is applied to the smaller (due to the compensation for having more outputs) scale numbers without dropout during testing, then that shift may be off. Does the TensorFlow batch normalization layer automatically compensate for this? Or does this not happen for some reason I'm missing?

此外,将这两者结合使用时是否还有其他陷阱需要注意?例如,假设我按照上面的正确顺序使用它们(假设有一个正确的顺序),在多个连续层上同时使用批量标准化和 dropout 会不会有问题?我没有立即发现这有什么问题,但我可能会遗漏一些东西.

Also, are there other pitfalls to look out for in when using these two together? For example, assuming I'm using them in the correct order in regards to the above (assuming there is a correct order), could there be trouble with using both batch normalization and dropout on multiple successive layers? I don't immediately see a problem with that, but I might be missing something.

非常感谢!

更新:

实验测试似乎表明顺序确实很重要.我只使用批处理规范和 dropout 反向运行了相同的网络两次.当 dropout 在批规范之前,验证损失似乎随着训练损失的下降而上升.在另一种情况下,他们都失败了.但就我而言,动作很慢,所以经过更多训练后情况可能会发生变化,这只是一次测试.一个更明确和更明智的答案仍将不胜感激.

An experimental test seems to suggest that ordering does matter. I ran the same network twice with only the batch norm and dropout reverse. When the dropout is before the batch norm, validation loss seems to be going up as training loss is going down. They're both going down in the other case. But in my case the movements are slow, so things may change after more training and it's just a single test. A more definitive and informed answer would still be appreciated.

推荐答案

Ioffe and Szegedy 2015,作者表示我们希望确保对于任何参数值,网络始终产生具有所需分布的激活".所以批归一化层实际上是在卷积层/全连接层之后插入的,但在进入 ReLu(或任何其他类型的)激活之前.在 53 左右观看此视频分钟了解更多详情.

In the Ioffe and Szegedy 2015, the authors state that "we would like to ensure that for any parameter values, the network always produces activations with the desired distribution". So the Batch Normalization Layer is actually inserted right after a Conv Layer/Fully Connected Layer, but before feeding into ReLu (or any other kinds of) activation. See this video at around time 53 min for more details.

就 dropout 而言,我相信 dropout 是在激活层之后应用的.在辍学论文图 3b 中,辍学因子/概率矩阵 r(l) 对于隐藏层 l 应用于 y(l),其中 y(l) 是应用激活函数 f 后的结果.

As far as dropout goes, I believe dropout is applied after activation layer. In the dropout paper figure 3b, the dropout factor/probability matrix r(l) for hidden layer l is applied to it on y(l), where y(l) is the result after applying activation function f.

所以总结一下,使用batch normalization和dropout的顺序是:

So in summary, the order of using batch normalization and dropout is:

-> CONV/FC -> BatchNorm -> ReLu(或其他激活)-> Dropout -> CONV/FC ->

-> CONV/FC -> BatchNorm -> ReLu(or other activation) -> Dropout -> CONV/FC ->

这篇关于批量标准化和辍学的顺序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆