如何防止caffe中特定层的反向计算 [英] How do I to prevent backward computation in specific layers in caffe

查看:39
本文介绍了如何防止caffe中特定层的反向计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 caffe 中禁用某些卷积层中的反向计算,我该怎么做?
我使用了 propagate_down 设置,但是发现它适用于 fc 层而不是卷积层.

I want to disable the backward computations in certain convolution layers in caffe, how do I do this?
I have used propagate_down setting,however find out it works for fc layer but not convolution layer.

请帮忙~

第一次更新:我在 test/pool_proj 层设置了 propagate_down:false.我不希望它向后(但其他层向后).但是从日志文件来看,该层仍然需要向后.

first update: I set propagate_down:false in test/pool_proj layer. I don't want it to backward(but other layer backward). But from the log file, it says that the layer still needs backward.

第二次更新:让我们表示一个深度学习模型,从输入层到输出层有两条路径,p1:A->B->C->D,p2:A->B->C1->D,A为输入层,D为fc层,其他为conv层.当从 D 向后向前层梯度反向时,p1 与正常的梯度后向过程没有什么不同,但是对于 p2,它在 C1 处停止(但 C1 层的权重仍然更新,它只是不将其错误后向到前一层).

second update: Let's denote a deep learning model, there are two path from input layer to output layer, p1: A->B->C->D, p2: A->B->C1->D, A is the input layer and D is fc layer, others are conv layer. When gradient backward from D to previous layers, p1 has no different from the normal gradient-backward procedure, but for p2, it stop at C1(but the weight of C1 layer still update, it just doesn't backward its error to previous layers).

原型文件

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 224
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }
  data_param {
    source: "/media/eric/main/data/ImageNet/ilsvrc12_train_lmdb"
    batch_size: 32
    backend: LMDB
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 224
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }
  data_param {
    source: "/media/eric/main/data/ImageNet/ilsvrc12_val_lmdb"
    batch_size: 50
    backend: LMDB
  }
}
layer {
  name: "conv1/7x7_s2"
  type: "Convolution"
  bottom: "data"
  top: "conv1/7x7_s2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 3
    kernel_size: 7
    stride: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "conv1/relu_7x7"
  type: "ReLU"
  bottom: "conv1/7x7_s2"
  top: "conv1/7x7_s2"
}
layer {
  name: "pool1/3x3_s2"
  type: "Pooling"
  bottom: "conv1/7x7_s2"
  top: "pool1/3x3_s2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "pool1/norm1"
  type: "LRN"
  bottom: "pool1/3x3_s2"
  top: "pool1/norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2/3x3_reduce"
  type: "Convolution"
  bottom: "pool1/norm1"
  top: "conv2/3x3_reduce"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    kernel_size: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "conv2/relu_3x3_reduce"
  type: "ReLU"
  bottom: "conv2/3x3_reduce"
  top: "conv2/3x3_reduce"
}
layer {
  name: "conv2/3x3"
  type: "Convolution"
  bottom: "conv2/3x3_reduce"
  top: "conv2/3x3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 192
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "conv2/relu_3x3"
  type: "ReLU"
  bottom: "conv2/3x3"
  top: "conv2/3x3"
}
layer {
  name: "conv2/norm2"
  type: "LRN"
  bottom: "conv2/3x3"
  top: "conv2/norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "pool2/3x3_s2"
  type: "Pooling"
  bottom: "conv2/norm2"
  top: "pool2/3x3_s2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}


layer {
  name: "test/5x5_reduce"
  type: "Convolution"
  bottom: "pool2/3x3_s2"
  top: "test/5x5_reduce"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 16
    kernel_size: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "test/relu_5x5_reduce"
  type: "ReLU"
  bottom: "test/5x5_reduce"
  top: "test/5x5_reduce"
}
layer {
  name: "test/5x5"
  type: "Convolution"
  bottom: "test/5x5_reduce"
  top: "test/5x5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "test/relu_5x5"
  type: "ReLU"
  bottom: "test/5x5"
  top: "test/5x5"
}
layer {
  name: "test/pool"
  type: "Pooling"
  bottom: "pool2/3x3_s2"
  top: "test/pool"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 1
    pad: 1
  }
}
layer {
  name: "test/pool_proj"
  type: "Convolution"
  bottom: "test/pool"
  top: "test/pool_proj"
  propagate_down:false
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 32
    kernel_size: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
layer {
  name: "test/relu_pool_proj"
  type: "ReLU"
  bottom: "test/pool_proj"
  top: "test/pool_proj"
}
layer {
  name: "test/output"
  type: "Concat"
  bottom: "test/5x5"
  bottom: "test/pool_proj"
  top: "test/output"
}

layer{
  name: "test_output/pool"
  type: "Pooling"
  bottom: "test/output"
  top: "test/output"
  pooling_param{
     pool: MAX
     kernel_size: 28
  }
}

layer {
  name: "classifier"
  type: "InnerProduct"
  bottom: "test/output"
  top: "classifier"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1000
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

layer {
  name: "loss3"
  type: "SoftmaxWithLoss"
  bottom: "classifier"
  bottom: "label"
  top: "loss3"
  loss_weight: 1
}
layer {
  name: "top-1"
  type: "Accuracy"
  bottom: "classifier"
  bottom: "label"
  top: "top-1"
  include {
    phase: TEST
  }
}
layer {
  name: "top-5"
  type: "Accuracy"
  bottom: "classifier"
  bottom: "label"
  top: "top-5"
  include {
    phase: TEST
  }
  accuracy_param {
    top_k: 5
  }
}

日志

I1116 15:44:04.405261 19358 net.cpp:226] loss3 needs backward computation.
I1116 15:44:04.405283 19358 net.cpp:226] classifier needs backward computation.
I1116 15:44:04.405302 19358 net.cpp:226] test_output/pool needs backward computation.
I1116 15:44:04.405320 19358 net.cpp:226] test/output needs backward computation.
I1116 15:44:04.405339 19358 net.cpp:226] test/relu_pool_proj needs backward computation.
I1116 15:44:04.405357 19358 net.cpp:226] test/pool_proj needs backward computation.
I1116 15:44:04.405375 19358 net.cpp:228] test/pool does not need backward computation.
I1116 15:44:04.405395 19358 net.cpp:226] test/relu_5x5 needs backward computation.
I1116 15:44:04.405412 19358 net.cpp:226] test/5x5 needs backward computation.
I1116 15:44:04.405431 19358 net.cpp:226] test/relu_5x5_reduce needs backward computation.
I1116 15:44:04.405448 19358 net.cpp:226] test/5x5_reduce needs backward computation.
I1116 15:44:04.405468 19358 net.cpp:226] pool2/3x3_s2_pool2/3x3_s2_0_split needs backward computation.
I1116 15:44:04.405485 19358 net.cpp:226] pool2/3x3_s2 needs backward computation.
I1116 15:44:04.405505 19358 net.cpp:226] conv2/norm2 needs backward computation.
I1116 15:44:04.405522 19358 net.cpp:226] conv2/relu_3x3 needs backward computation.
I1116 15:44:04.405542 19358 net.cpp:226] conv2/3x3 needs backward computation.
I1116 15:44:04.405560 19358 net.cpp:226] conv2/relu_3x3_reduce needs backward computation.
I1116 15:44:04.405578 19358 net.cpp:226] conv2/3x3_reduce needs backward computation.
I1116 15:44:04.405596 19358 net.cpp:226] pool1/norm1 needs backward computation.
I1116 15:44:04.405616 19358 net.cpp:226] pool1/3x3_s2 needs backward computation.
I1116 15:44:04.405632 19358 net.cpp:226] conv1/relu_7x7 needs backward computation.
I1116 15:44:04.405652 19358 net.cpp:226] conv1/7x7_s2 needs backward computation.
I1116 15:44:04.405670 19358 net.cpp:228] data does not need backward computation.
I1116 15:44:04.405705 19358 net.cpp:270] This network produces output loss3
I1116 15:44:04.405745 19358 net.cpp:283] Network initialization done.

推荐答案

来自 Evan Shelhamer (https://groups.google.com/forum/#!topic/caffe-users/54Z-B-CXmLE):

From Evan Shelhamer (https://groups.google.com/forum/#!topic/caffe-users/54Z-B-CXmLE):

propagate_down 旨在关闭沿某些路径的反向传播从损失中,而不是在早期完全关闭层图形.如果梯度通过另一条路径传播到一个层,或者不禁用权重衰减等正则化,参数这些图层仍将更新.我怀疑衰变仍在继续这些层,所以你可以设置decay_mult: 0 权重和偏见.

propagate_down is intended to switch off backprop along certain paths from the loss while not entirely turning off layers earlier in the graph. If gradients propagate to a layer by another path, or regularization such as weight decay is not disabled, the parameters of these layers will still be updated. I suspect decay is still on for these layers, so you could set decay_mult: 0 for the weights and biases.

设置 lr_mult: 0 另一方面修复参数和跳过在不需要的地方进行反向传播.

Setting lr_mult: 0 on the other hand fixes parameters and skips backprop where it is unnecessary.

您在一些早期层中有 decay_mult: 1,因此仍然计算梯度.在所有不想更新权重的层中设置 lr_mult: 0.

You have decay_mult: 1 in some of the early layers, so the gradients are still calculated. Set lr_mult: 0 in all of the layers that you don't want the weights updated.

例如,更改以下内容:

layer {
  name: "conv1/7x7_s2"
  type: "Convolution"
  bottom: "data"
  top: "conv1/7x7_s2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 3
    kernel_size: 7
    stride: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}

layer {
  name: "conv1/7x7_s2"
  type: "Convolution"
  bottom: "data"
  top: "conv1/7x7_s2"
  param {
    lr_mult: 0
    decay_mult: 1
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 3
    kernel_size: 7
    stride: 2
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}

也供参考:

这篇关于如何防止caffe中特定层的反向计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆