不同的学习率会影响Batchnorm设置.为什么? [英] Different learning rate affect to batchnorm setting. Why?
问题描述
我正在使用BatchNorm层.我知道设置use_global_stats
的含义,通常将false
设置为训练,将true
设置为测试/部署.这是我在测试阶段的设置.
I am using BatchNorm layer. I know the meaning of setting use_global_stats
that often set false
for training and true
for testing/deploy. This is my setting in the testing phase.
layer {
name: "bnorm1"
type: "BatchNorm"
bottom: "conv1"
top: "bnorm1"
batch_norm_param {
use_global_stats: true
}
}
layer {
name: "scale1"
type: "Scale"
bottom: "bnorm1"
top: "bnorm1"
bias_term: true
scale_param {
filler {
value: 1
}
bias_filler {
value: 0.0
}
}
}
在Solver.prototxt中,我使用了Adam方法.我发现一个有趣的问题发生在我的情况下.如果选择base_lr: 1e-3
,则在测试阶段设置use_global_stats: false
时会获得良好的性能.但是,如果选择base_lr: 1e-4
,则在测试阶段设置use_global_stats: true
时会获得良好的性能.它表明base_lr
对batchnorm设置有影响(即使我使用了Adam方法)?您能提出任何理由吗?谢谢大家
In solver.prototxt, I used the Adam method. I found an interesting problem that happens in my case. If I choose base_lr: 1e-3
, then I got a good performance when I set use_global_stats: false
in the testing phase. However, if I chose base_lr: 1e-4
, then I got a good performance when I set use_global_stats: true
in the testing phase. It demonstrates that base_lr
effects to the batchnorm setting (even I used Adam method)? Could you suggest any reason for that? Thanks all
推荐答案
AFAIK学习率不会直接 影响"BatchNorm"
层的学习参数.的确,无论求解器的base_lr
或type
如何,该层所有内部参数的caffe力lr_mult
均为零.
但是,您可能会遇到一种情况,相邻的层会根据您使用的base_lr
收敛到不同的点,从而间接导致"BatchNorm"
的行为不同.
AFAIK learning rate does not directly affect the learned parameters of "BatchNorm"
layer. Indeed, caffe forces lr_mult
for all internal parameters of this layer to be zero regardless of base_lr
or the type
of the solver.
However, you might encounter a case where the adjacent layers converge to different points according to the base_lr
you are using, and indirectly this causes the "BatchNorm"
to behave differently.
这篇关于不同的学习率会影响Batchnorm设置.为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!