如何防止Caffe中特定图层的向后计算 [英] How do I to prevent backward computation in specific layers in caffe
问题描述
我想在caffe的某些卷积层中禁用向后计算,该怎么做?
我使用了propagate_down
设置,但是发现它适用于fc层但不适用于卷积层.
I want to disable the backward computations in certain convolution layers in caffe, how do I do this?
I have used propagate_down
setting,however find out it works for fc layer but not convolution layer.
请帮助〜
首次更新:我在 test/pool_proj 层中设置了 propagate_down:false .我不希望它向后(但其他层向后).但是从日志文件中说,该层仍然需要向后移动.
first update: I set propagate_down:false in test/pool_proj layer. I don't want it to backward(but other layer backward). But from the log file, it says that the layer still needs backward.
第二次更新:让我们表示一个深度学习模型,从输入层到输出层有两条路径,p1:A-> B-> C-> D,p2:A-> B -> C1-> D,A是输入层,D是fc层,其他是conv层.当从D向后倾斜到前一层时,p1与正常的向后倾斜过程没有什么不同,但是对于p2,它在C1处停止(但是C1层的权重仍然更新,只是不会将其误差向后倾斜到前一层) ).
second update: Let's denote a deep learning model, there are two path from input layer to output layer, p1: A->B->C->D, p2: A->B->C1->D, A is the input layer and D is fc layer, others are conv layer. When gradient backward from D to previous layers, p1 has no different from the normal gradient-backward procedure, but for p2, it stop at C1(but the weight of C1 layer still update, it just doesn't backward its error to previous layers).
prototxt
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mirror: true
crop_size: 224
mean_value: 104
mean_value: 117
mean_value: 123
}
data_param {
source: "/media/eric/main/data/ImageNet/ilsvrc12_train_lmdb"
batch_size: 32
backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 224
mean_value: 104
mean_value: 117
mean_value: 123
}
data_param {
source: "/media/eric/main/data/ImageNet/ilsvrc12_val_lmdb"
batch_size: 50
backend: LMDB
}
}
layer {
name: "conv1/7x7_s2"
type: "Convolution"
bottom: "data"
top: "conv1/7x7_s2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 3
kernel_size: 7
stride: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "conv1/relu_7x7"
type: "ReLU"
bottom: "conv1/7x7_s2"
top: "conv1/7x7_s2"
}
layer {
name: "pool1/3x3_s2"
type: "Pooling"
bottom: "conv1/7x7_s2"
top: "pool1/3x3_s2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "pool1/norm1"
type: "LRN"
bottom: "pool1/3x3_s2"
top: "pool1/norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2/3x3_reduce"
type: "Convolution"
bottom: "pool1/norm1"
top: "conv2/3x3_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "conv2/relu_3x3_reduce"
type: "ReLU"
bottom: "conv2/3x3_reduce"
top: "conv2/3x3_reduce"
}
layer {
name: "conv2/3x3"
type: "Convolution"
bottom: "conv2/3x3_reduce"
top: "conv2/3x3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 192
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "conv2/relu_3x3"
type: "ReLU"
bottom: "conv2/3x3"
top: "conv2/3x3"
}
layer {
name: "conv2/norm2"
type: "LRN"
bottom: "conv2/3x3"
top: "conv2/norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool2/3x3_s2"
type: "Pooling"
bottom: "conv2/norm2"
top: "pool2/3x3_s2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "test/5x5_reduce"
type: "Convolution"
bottom: "pool2/3x3_s2"
top: "test/5x5_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 16
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "test/relu_5x5_reduce"
type: "ReLU"
bottom: "test/5x5_reduce"
top: "test/5x5_reduce"
}
layer {
name: "test/5x5"
type: "Convolution"
bottom: "test/5x5_reduce"
top: "test/5x5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "test/relu_5x5"
type: "ReLU"
bottom: "test/5x5"
top: "test/5x5"
}
layer {
name: "test/pool"
type: "Pooling"
bottom: "pool2/3x3_s2"
top: "test/pool"
pooling_param {
pool: MAX
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "test/pool_proj"
type: "Convolution"
bottom: "test/pool"
top: "test/pool_proj"
propagate_down:false
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "test/relu_pool_proj"
type: "ReLU"
bottom: "test/pool_proj"
top: "test/pool_proj"
}
layer {
name: "test/output"
type: "Concat"
bottom: "test/5x5"
bottom: "test/pool_proj"
top: "test/output"
}
layer{
name: "test_output/pool"
type: "Pooling"
bottom: "test/output"
top: "test/output"
pooling_param{
pool: MAX
kernel_size: 28
}
}
layer {
name: "classifier"
type: "InnerProduct"
bottom: "test/output"
top: "classifier"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1000
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "loss3"
type: "SoftmaxWithLoss"
bottom: "classifier"
bottom: "label"
top: "loss3"
loss_weight: 1
}
layer {
name: "top-1"
type: "Accuracy"
bottom: "classifier"
bottom: "label"
top: "top-1"
include {
phase: TEST
}
}
layer {
name: "top-5"
type: "Accuracy"
bottom: "classifier"
bottom: "label"
top: "top-5"
include {
phase: TEST
}
accuracy_param {
top_k: 5
}
}
日志
I1116 15:44:04.405261 19358 net.cpp:226] loss3 needs backward computation.
I1116 15:44:04.405283 19358 net.cpp:226] classifier needs backward computation.
I1116 15:44:04.405302 19358 net.cpp:226] test_output/pool needs backward computation.
I1116 15:44:04.405320 19358 net.cpp:226] test/output needs backward computation.
I1116 15:44:04.405339 19358 net.cpp:226] test/relu_pool_proj needs backward computation.
I1116 15:44:04.405357 19358 net.cpp:226] test/pool_proj needs backward computation.
I1116 15:44:04.405375 19358 net.cpp:228] test/pool does not need backward computation.
I1116 15:44:04.405395 19358 net.cpp:226] test/relu_5x5 needs backward computation.
I1116 15:44:04.405412 19358 net.cpp:226] test/5x5 needs backward computation.
I1116 15:44:04.405431 19358 net.cpp:226] test/relu_5x5_reduce needs backward computation.
I1116 15:44:04.405448 19358 net.cpp:226] test/5x5_reduce needs backward computation.
I1116 15:44:04.405468 19358 net.cpp:226] pool2/3x3_s2_pool2/3x3_s2_0_split needs backward computation.
I1116 15:44:04.405485 19358 net.cpp:226] pool2/3x3_s2 needs backward computation.
I1116 15:44:04.405505 19358 net.cpp:226] conv2/norm2 needs backward computation.
I1116 15:44:04.405522 19358 net.cpp:226] conv2/relu_3x3 needs backward computation.
I1116 15:44:04.405542 19358 net.cpp:226] conv2/3x3 needs backward computation.
I1116 15:44:04.405560 19358 net.cpp:226] conv2/relu_3x3_reduce needs backward computation.
I1116 15:44:04.405578 19358 net.cpp:226] conv2/3x3_reduce needs backward computation.
I1116 15:44:04.405596 19358 net.cpp:226] pool1/norm1 needs backward computation.
I1116 15:44:04.405616 19358 net.cpp:226] pool1/3x3_s2 needs backward computation.
I1116 15:44:04.405632 19358 net.cpp:226] conv1/relu_7x7 needs backward computation.
I1116 15:44:04.405652 19358 net.cpp:226] conv1/7x7_s2 needs backward computation.
I1116 15:44:04.405670 19358 net.cpp:228] data does not need backward computation.
I1116 15:44:04.405705 19358 net.cpp:270] This network produces output loss3
I1116 15:44:04.405745 19358 net.cpp:283] Network initialization done.
推荐答案
From Evan Shelhamer (https://groups.google.com/forum/#!topic/caffe-users/54Z-B-CXmLE):
propagate_down旨在沿某些路径关闭反向传播 损失,而并没有完全关闭较早阶段的层 图形.如果渐变通过其他路径传播到图层,或者 诸如重量衰减之类的正则化未禁用, 这些图层仍将更新.我怀疑衰变仍在继续 这些图层,因此您可以将weights设置为delay_mult:0并将 偏见.
propagate_down is intended to switch off backprop along certain paths from the loss while not entirely turning off layers earlier in the graph. If gradients propagate to a layer by another path, or regularization such as weight decay is not disabled, the parameters of these layers will still be updated. I suspect decay is still on for these layers, so you could set decay_mult: 0 for the weights and biases.
另一方面,将lr_mult设置为0可修复参数并跳过 在不需要的地方进行反向传播.
Setting lr_mult: 0 on the other hand fixes parameters and skips backprop where it is unnecessary.
您在某些早期图层中具有decay_mult: 1
,因此仍会计算梯度.在您不希望权重更新的所有图层中设置lr_mult: 0
.
You have decay_mult: 1
in some of the early layers, so the gradients are still calculated. Set lr_mult: 0
in all of the layers that you don't want the weights updated.
例如,更改以下内容:
layer {
name: "conv1/7x7_s2"
type: "Convolution"
bottom: "data"
top: "conv1/7x7_s2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 3
kernel_size: 7
stride: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
到
layer {
name: "conv1/7x7_s2"
type: "Convolution"
bottom: "data"
top: "conv1/7x7_s2"
param {
lr_mult: 0
decay_mult: 1
}
param {
lr_mult: 0
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 3
kernel_size: 7
stride: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
也供参考:
这篇关于如何防止Caffe中特定图层的向后计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!