Caffe:如果两个反向支撑渐变到同一底部Blob,将会发生什么? [英] Caffe: what will happen if two layers backprop gradients to the same bottom blob?

查看:80
本文介绍了Caffe:如果两个反向支撑渐变到同一底部Blob,将会发生什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如果我有一个生成底部斑点的层,该斑点会被随后的两个后续层进一步消耗,这两个层都会在向后传播阶段生成一些渐变以填充bottom.diff.是否将两个梯度加起来形成最终的梯度?还是只有其中一个可以生存?以我的理解,Caffe层需要先将bottom.diff设置为全零,然后再填充一些计算出的渐变,对吗? memset会否清除另一层已经计算出的渐变?谢谢!

I'm wondering what if I have a layer generating a bottom blob that is further consumed by two subsequent layers, both of which will generate some gradients to fill bottom.diff in the back propagation stage. Will both two gradients be added up to form the final gradient? Or, only one of them can live? In my understanding, Caffe layers need to memset the bottom.diff to all zeros before filling it with some computed gradients, right? Will the memset flush out the already computed gradients by the other layer? Thank you!

推荐答案

使用一个以上的损失层并非非同寻常,请参阅 在caffe中,每个损失层都有一个关联的 loss_weight :此特定组件如何影响网络的损耗函数.因此,如果您的网络有两个损失层,分别为Loss1Loss1,则您的网络的总体损失为

Using more than a single loss layer is not out-of-the-ordinary, see GoogLeNet for example: It has three loss layers "pushing" gradients at different depths of the net.
In caffe, each loss layer has a associated loss_weight: how this particular component contribute to the loss function of the net. Thus, if your net has two loss layers, Loss1 and Loss1 the overall loss of your net is

Loss = loss_weight1*Loss1 + loss_weight2*Loss2

反向传播使用链式规则传播Loss的梯度(整个网络中的所有层).链式规则将Loss的导数分解为偏导数,即每一层的导数,通过使梯度传播通过偏导数,可以获得整体效果.也就是说,通过使用top.diff和图层的backward()函数计算bottom.diff,不仅要考虑图层的导数,而且还要考虑top.diff中表示的所有更高图层的效果.

The backpropagation uses the chain rule to propagate the gradient of Loss (the overall loss) through all the layers in the net. The chain rule breaks down the derivation of Loss into partial derivatives, i.e., the derivatives of each layer, the overall effect is obtained by propagating the gradients through the partial derivatives. That is, by using top.diff and the layer's backward() function to compute bottom.diff one takes into account not only the layer's derivative, but also the effect of ALL higher layers expressed in top.diff.

TL; DR
您可以具有多个损失层. Caffe(以及其他任何体面的深度学习框架)都可以为您无缝地处理它.

TL;DR
You can have multiple loss layers. Caffe (as well as any other decent deep learning framework) handles it seamlessly for you.

这篇关于Caffe:如果两个反向支撑渐变到同一底部Blob,将会发生什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆