Caffe 中的“weight_decay"元参数是什么? [英] What is `weight_decay` meta parameter in Caffe?

查看:23
本文介绍了Caffe 中的“weight_decay"元参数是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

看一个例子 'solver.prototxt',贴在BVLC/caffe git上,有一个训练元参数

Looking at an example 'solver.prototxt', posted on BVLC/caffe git, there is a training meta parameter

weight_decay: 0.04

这个元参数是什么意思?我应该赋予它什么价值?

What does this meta parameter mean? And what value should I assign to it?

推荐答案

weight_decay 元参数控制神经网络的正则化项.

The weight_decay meta parameter govern the regularization term of the neural net.

在训练期间,将正则化项添加到网络的损失中以计算反向传播梯度.weight_decay 值决定了这个正则化项在梯度计算中的主导地位.

During training a regularization term is added to the network's loss to compute the backprop gradient. The weight_decay value determines how dominant this regularization term will be in the gradient computation.

根据经验,您拥有的训练示例越多,该术语就越弱.您拥有的参数越多(即更深的网络、更大的过滤器、更大的 InnerProduct 层等),该术语应该越高.

As a rule of thumb, the more training examples you have, the weaker this term should be. The more parameters you have (i.e., deeper net, larger filters, larger InnerProduct layers etc.) the higher this term should be.

Caffe 还允许您在 L2 正则化(默认)和 L1 正则化之间进行选择,通过设置

Caffe also allows you to choose between L2 regularization (default) and L1 regularization, by setting

regularization_type: "L1"

然而,由于在大多数情况下权重是小数(即 -1),权重的 L2 范数明显小于它们的 -1code>L1 范数.因此,如果您选择使用 regularization_type: "L1",您可能需要将 weight_decay 调整为一个明显更小的值.

However, since in most cases weights are small numbers (i.e., -1<w<1), the L2 norm of the weights is significantly smaller than their L1 norm. Thus, if you choose to use regularization_type: "L1" you might need to tune weight_decay to a significantly smaller value.

虽然学习率可能(并且通常会)在训练期间发生变化,但正则化权重始终是固定的.

While learning rate may (and usually does) change during training, the regularization weight is fixed throughout.

这篇关于Caffe 中的“weight_decay"元参数是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆