Caffe:如果两层反向传播梯度到同一个底部 blob 会发生什么? [英] Caffe: what will happen if two layers backprop gradients to the same bottom blob?

查看:41
本文介绍了Caffe:如果两层反向传播梯度到同一个底部 blob 会发生什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如果我有一个生成底部 blob 的层,该层会被两个后续层进一步消耗,这两个层都会在反向传播阶段生成一些梯度来填充 bottom.diff.两个梯度会加起来形成最终的梯度吗?或者,只有其中一个可以存活?在我的理解中,Caffe 层需要在用一些计算的梯度填充之前将 bottom.diff 设置为全零,对吗?memset 会清除其他层已经计算的梯度吗?谢谢!

I'm wondering what if I have a layer generating a bottom blob that is further consumed by two subsequent layers, both of which will generate some gradients to fill bottom.diff in the back propagation stage. Will both two gradients be added up to form the final gradient? Or, only one of them can live? In my understanding, Caffe layers need to memset the bottom.diff to all zeros before filling it with some computed gradients, right? Will the memset flush out the already computed gradients by the other layer? Thank you!

推荐答案

使用多个损失层并不少见,参见 GoogLeNet 例如:它在网络的不同深度具有三个损失层推"梯度.
在 caffe 中,每个损失层都有一个关联的 loss_weight:这个特定组件如何影响网络的损失函数.因此,如果您的网络有两个损失层,Loss1Loss1,您的网络的整体损失为

Using more than a single loss layer is not out-of-the-ordinary, see GoogLeNet for example: It has three loss layers "pushing" gradients at different depths of the net.
In caffe, each loss layer has a associated loss_weight: how this particular component contribute to the loss function of the net. Thus, if your net has two loss layers, Loss1 and Loss1 the overall loss of your net is

Loss = loss_weight1*Loss1 + loss_weight2*Loss2

反向传播使用链式法则来传播Loss(整体损失)通过网络中的所有层.链式法则将Loss的推导分解为偏导数,即每一层的导数,通过偏导数传播梯度来获得整体效果.也就是说,通过使用 top.diff 和层的 backward() 函数来计算 bottom.diff 不仅考虑了层的导数,以及 top.diff 中表示的所有更高层的效果.

The backpropagation uses the chain rule to propagate the gradient of Loss (the overall loss) through all the layers in the net. The chain rule breaks down the derivation of Loss into partial derivatives, i.e., the derivatives of each layer, the overall effect is obtained by propagating the gradients through the partial derivatives. That is, by using top.diff and the layer's backward() function to compute bottom.diff one takes into account not only the layer's derivative, but also the effect of ALL higher layers expressed in top.diff.

TL;DR
您可以有多个损失层.Caffe(以及任何其他体面的深度学习框架)为您无缝处理.

TL;DR
You can have multiple loss layers. Caffe (as well as any other decent deep learning framework) handles it seamlessly for you.

这篇关于Caffe:如果两层反向传播梯度到同一个底部 blob 会发生什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆