参数keep_graph在变量的backward()方法中意味着什么? [英] What does the parameter retain_graph mean in the Variable's backward() method?

本文介绍了参数keep_graph在变量的backward()方法中意味着什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读神经传递pytorch教程,对此感到困惑使用retain_variable(不推荐使用,现在称为retain_graph).代码示例显示:

I'm going through the neural transfer pytorch tutorial and am confused about the use of retain_variable(deprecated, now referred to as retain_graph). The code example show:

class ContentLoss(nn.Module):

    def __init__(self, target, weight):
        super(ContentLoss, self).__init__()
        self.target = target.detach() * weight
        self.weight = weight
        self.criterion = nn.MSELoss()

    def forward(self, input):
        self.loss = self.criterion(input * self.weight, self.target)
        self.output = input
        return self.output

    def backward(self, retain_variables=True):
        #Why is retain_variables True??
        self.loss.backward(retain_variables=retain_variables)
        return self.loss

来自文档

retain_graph(布尔型,可选)–如果为False,则用于计算的图形 毕业将被释放.请注意,几乎在所有情况下 不需要True选项,通常可以在很多情况下解决 更有效的方法.默认值为create_graph的值.

retain_graph (bool, optional) – If False, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Defaults to the value of create_graph.

因此,通过设置retain_graph= True,我们不会在向后传递时释放为该图分配的内存.保留此内存有什么好处,为什么我们需要它?

So by setting retain_graph= True, we're not freeing the memory allocated for the graph on the backward pass. What is the advantage of keeping this memory around, why do we need it?

推荐答案

@cleros非常适合使用retain_graph=True.本质上,它将保留任何必要的信息以计算某个变量,以便我们可以对其进行反向传递.

@cleros is pretty on the point about the use of retain_graph=True. In essence, it will retain any necessary information to calculate a certain variable, so that we can do backward pass on it.

假设我们有一个上面显示的计算图.变量de是输出,而a是输入.例如,

Suppose that we have a computation graph shown above. The variable d and e is the output, and a is the input. For example,

import torch
from torch.autograd import Variable
a = Variable(torch.rand(1, 4), requires_grad=True)
b = a**2
c = b*2
d = c.mean()
e = c.sum()

当我们做d.backward()时,这很好.计算完成后,默认情况下将释放计算d的图形部分以节省内存.因此,如果执行e.backward(),将弹出错误消息.为了执行e.backward(),我们必须在d.backward()中将参数retain_graph设置为True,即

when we do d.backward(), that is fine. After this computation, the part of graph that calculate d will be freed by default to save memory. So if we do e.backward(), the error message will pop up. In order to do e.backward(), we have to set the parameter retain_graph to True in d.backward(), i.e.,

d.backward(retain_graph=True)

只要在向后方法中使用retain_graph=True,就可以在任何时候进行向后操作:

As long as you use retain_graph=True in your backward method, you can do backward any time you want:

d.backward(retain_graph=True) # fine
e.backward(retain_graph=True) # fine
d.backward() # also fine
e.backward() # error will occur!

可以找到更有用的讨论此处.

More useful discussion can be found here.

现在,一个真正的用例是多任务学习,其中您可能会面临不同层次的多重损失.假设您有2个损失:loss1loss2,它们分别位于不同的层中.为了将loss1loss2 w.r.t的梯度反向传播到网络的可学习权重.在第一个反向传播损失中,必须在backward()方法中使用retain_graph=True.

Right now, a real use case is multi-task learning where you have multiple loss which maybe be at different layers. Suppose that you have 2 losses: loss1 and loss2 and they reside in different layers. In order to backprop the gradient of loss1 and loss2 w.r.t to the learnable weight of your network independently. You have to use retain_graph=True in backward() method in the first back-propagated loss.

# suppose you first back-propagate loss1, then loss2 (you can also do the reverse)
loss1.backward(retain_graph=True)
loss2.backward() # now the graph is freed, and next process of batch gradient descent is ready
optimizer.step() # update the network parameters

这篇关于参数keep_graph在变量的backward()方法中意味着什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆