“BatchNorm"应该如何处理?在caffe中使用层? [英] How should "BatchNorm" layer be used in caffe?

查看:32
本文介绍了“BatchNorm"应该如何处理?在caffe中使用层?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对如何使用/插入 "BatchNorm" 层有点困惑在我的模型中.
我看到了几种不同的方法,例如:

I am a little confused about how should I use/insert "BatchNorm" layer in my models.
I see several different approaches, for instance:

"BatchNorm" 层紧跟在 "Scale" 层之后:

"BatchNorm" layer is followed immediately with "Scale" layer:

layer {
    bottom: "res2a_branch1"
    top: "res2a_branch1"
    name: "bn2a_branch1"
    type: "BatchNorm"
    batch_norm_param {
        use_global_stats: true
    }
}

layer {
    bottom: "res2a_branch1"
    top: "res2a_branch1"
    name: "scale2a_branch1"
    type: "Scale"
    scale_param {
        bias_term: true
    }
}

cifar10 示例:仅BatchNorm"

在 caffe 提供的 cifar10 示例中,使用 "BatchNorm" 后没有任何 "Scale" :

cifar10 example: only "BatchNorm"

In the cifar10 example provided with caffe, "BatchNorm" is used without any "Scale" following it:

layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}

cifar10 不同的batch_norm 用于 TRAINTEST

batch_norm_param: use_global_scaleTRAINTEST 阶段之间改变:

cifar10 Different batch_norm_param for TRAIN and TEST

batch_norm_param: use_global_scale is changed between TRAIN and TEST phase:

layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: false
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: true
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TEST
  }
}

那应该是什么?

如何在caffe中使用"BatchNorm"层?

推荐答案

如果按照原论文,Batch normalization 后面应该跟 Scale 和 Bias 层(可以通过 Scale 包含偏差,尽管这会使 Bias参数不可访问).use_global_stats 也应该从训练 (False) 更改为测试/部署 (True) - 这是默认行为.请注意,您给出的第一个示例是用于部署的 prototxt,因此将其设置为 True 是正确的.

If you follow the original paper, the Batch normalization should be followed by Scale and Bias layers (the bias can be included via the Scale, although this makes the Bias parameters inaccessible). use_global_stats should also be changed from training (False) to testing/deployment (True) - which is the default behavior. Note that the first example you give is a prototxt for deployment, so it is correct for it to be set to True.

我不确定共享参数.

我提出了一个pull request来改进关于批处理规范化的文档,但后来因为我想修改它而关闭了它.然后,我再也没有回过头来.

I made a pull request to improve the documents on the batch normalization, but then closed it because I wanted to modify it. And then, I never got back to it.

请注意,我认为 "BatchNorm"lr_mult: 0 不再需要(可能不允许?),尽管我现在没有找到相应的 PR.

Note that I think lr_mult: 0 for "BatchNorm" is no longer required (perhaps not allowed?), although I'm not finding the corresponding PR now.

这篇关于“BatchNorm"应该如何处理?在caffe中使用层?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆