Caffe中的"weight_decay"元参数是什么? [英] What is `weight_decay` meta parameter in Caffe?
问题描述
查看示例 'solver.prototxt'
,发布在BVLC/caffe git上,其中有一个训练元参数
Looking at an example 'solver.prototxt'
, posted on BVLC/caffe git, there is a training meta parameter
weight_decay: 0.04
此meta参数是什么意思?我应该给它赋什么值?
What does this meta parameter mean? And what value should I assign to it?
推荐答案
weight_decay
元参数控制神经网络的正则化项.
The weight_decay
meta parameter govern the regularization term of the neural net.
在训练期间,将正则化项添加到网络的损耗中以计算反向传播梯度. weight_decay
值确定此正则项在梯度计算中的主导地位.
During training a regularization term is added to the network's loss to compute the backprop gradient. The weight_decay
value determines how dominant this regularization term will be in the gradient computation.
根据经验,训练示例越多,该术语就越弱.您拥有的参数越多(即,网络越深,过滤器越大,InnerProduct层越大等),则该术语应该越高.
As a rule of thumb, the more training examples you have, the weaker this term should be. The more parameters you have (i.e., deeper net, larger filters, larger InnerProduct layers etc.) the higher this term should be.
Caffe还允许您通过设置在L2
正则化(默认)和L1
正则化之间进行选择
Caffe also allows you to choose between L2
regularization (default) and L1
regularization, by setting
regularization_type: "L1"
但是,由于在大多数情况下权重是较小的数字(即-1<w<1
),因此权重的L2
范数明显小于其L1
范数.因此,如果您选择使用regularization_type: "L1"
,则可能需要将weight_decay
调整为较小的值.
However, since in most cases weights are small numbers (i.e., -1<w<1
), the L2
norm of the weights is significantly smaller than their L1
norm. Thus, if you choose to use regularization_type: "L1"
you might need to tune weight_decay
to a significantly smaller value.
虽然训练期间学习率可能会(通常会发生变化),但调整权重始终是固定的.
While learning rate may (and usually does) change during training, the regularization weight is fixed throughout.
这篇关于Caffe中的"weight_decay"元参数是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!