Caffe中的"weight_decay"元参数是什么? [英] What is `weight_decay` meta parameter in Caffe?

查看:444
本文介绍了Caffe中的"weight_decay"元参数是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

查看示例 'solver.prototxt' ,发布在BVLC/caffe git上,其中有一个训练元参数

Looking at an example 'solver.prototxt', posted on BVLC/caffe git, there is a training meta parameter

weight_decay: 0.04

此meta参数是什么意思?我应该给它赋什么值?

What does this meta parameter mean? And what value should I assign to it?

推荐答案

weight_decay元参数控制神经网络的正则化项.

The weight_decay meta parameter govern the regularization term of the neural net.

在训练期间,将正则化项添加到网络的损耗中以计算反向传播梯度. weight_decay值确定此正则项在梯度计算中的主导地位.

During training a regularization term is added to the network's loss to compute the backprop gradient. The weight_decay value determines how dominant this regularization term will be in the gradient computation.

根据经验,训练示例越多,该术语就越弱.您拥有的参数越多(即,网络越深,过滤器越大,InnerProduct层越大等),则该术语应该越高.

As a rule of thumb, the more training examples you have, the weaker this term should be. The more parameters you have (i.e., deeper net, larger filters, larger InnerProduct layers etc.) the higher this term should be.

Caffe还允许您通过设置在L2正则化(默认)和L1正则化之间进行选择

Caffe also allows you to choose between L2 regularization (default) and L1 regularization, by setting

regularization_type: "L1"

但是,由于在大多数情况下权重是较小的数字(即-1<w<1),因此权重的L2范数明显小于其L1范数.因此,如果您选择使用regularization_type: "L1",则可能需要将weight_decay调整为较小的值.

However, since in most cases weights are small numbers (i.e., -1<w<1), the L2 norm of the weights is significantly smaller than their L1 norm. Thus, if you choose to use regularization_type: "L1" you might need to tune weight_decay to a significantly smaller value.

虽然训练期间学习率可能会(通常会发生变化),但调整权重始终是固定的.

While learning rate may (and usually does) change during training, the regularization weight is fixed throughout.

这篇关于Caffe中的"weight_decay"元参数是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆