Caffe 中的“weight_decay"元参数是什么? [英] What is `weight_decay` meta parameter in Caffe?
问题描述
看一个例子 'solver.prototxt'
,贴在BVLC/caffe git上,有一个训练元参数
Looking at an example 'solver.prototxt'
, posted on BVLC/caffe git, there is a training meta parameter
weight_decay: 0.04
这个元参数是什么意思?我应该赋予它什么价值?
What does this meta parameter mean? And what value should I assign to it?
推荐答案
weight_decay
元参数控制神经网络的正则化项.
The weight_decay
meta parameter govern the regularization term of the neural net.
在训练期间,将正则化项添加到网络的损失中以计算反向传播梯度.weight_decay
值决定了这个正则化项在梯度计算中的主导地位.
During training a regularization term is added to the network's loss to compute the backprop gradient. The weight_decay
value determines how dominant this regularization term will be in the gradient computation.
根据经验,您拥有的训练示例越多,该术语就越弱.您拥有的参数越多(即更深的网络、更大的过滤器、更大的 InnerProduct 层等),该术语应该越高.
As a rule of thumb, the more training examples you have, the weaker this term should be. The more parameters you have (i.e., deeper net, larger filters, larger InnerProduct layers etc.) the higher this term should be.
Caffe 还允许您在 L2
正则化(默认)和 L1
正则化之间进行选择,通过设置
Caffe also allows you to choose between L2
regularization (default) and L1
regularization, by setting
regularization_type: "L1"
然而,由于在大多数情况下权重是小数(即 -1
L2
范数明显小于它们的 -1
范数.因此,如果您选择使用 regularization_type: "L1"
,您可能需要将 weight_decay
调整为一个明显更小的值.
However, since in most cases weights are small numbers (i.e., -1<w<1
), the L2
norm of the weights is significantly smaller than their L1
norm. Thus, if you choose to use regularization_type: "L1"
you might need to tune weight_decay
to a significantly smaller value.
虽然学习率可能(并且通常会)在训练期间发生变化,但正则化权重始终是固定的.
While learning rate may (and usually does) change during training, the regularization weight is fixed throughout.
这篇关于Caffe 中的“weight_decay"元参数是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!