Pytorch-获取中间变量/张量的梯度 [英] Pytorch - Getting gradient for intermediate variables / tensors

查看:1302
本文介绍了Pytorch-获取中间变量/张量的梯度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为pytorch框架(0.4.1)中的一个练习,我试图在一个简单的线性层(Z = X.W + B)中显示X的梯度(gX或dSdX).为了简化我的玩具示例,我从Z的总和(不是损失)中倒退().

As an exercice in pytorch framework (0.4.1) , I am trying to display the gradient of X (gX or dSdX) in a simple Linear layer (Z = X.W + B). To simplify my toy example, I backward() from a sum of Z (not a loss).

总而言之,我希望gX(dSdX)为S = sum(XW + B).

To sum up, I want gX(dSdX) of S=sum(XW+B).

问题是Z的梯度(dSdZ)为无.结果,gX当然也是错误的.

The problem is that the gradient of Z (dSdZ) is None. As a result, gX is wrong too of course.

import torch
X = torch.tensor([[0.5, 0.3, 2.1], [0.2, 0.1, 1.1]], requires_grad=True)
W = torch.tensor([[2.1, 1.5], [-1.4, 0.5], [0.2, 1.1]])
B = torch.tensor([1.1, -0.3])
Z = torch.nn.functional.linear(X, weight=W.t(), bias=B)
S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)

结果:

Z:
 tensor([[2.1500, 2.9100],
        [1.6000, 1.2600]], grad_fn=<ThAddmmBackward>)
gZ:
 None
gX:
 tensor([[ 3.6000, -0.9000,  1.3000],
        [ 3.6000, -0.9000,  1.3000]])

如果我使用nn.Module,则结果完全相同:

I have exactly the same result if I use nn.Module as below:

class Net1Linear(torch.nn.Module):
    def __init__(self, wi, wo,W,B):
        super(Net1Linear, self).__init__()
        self.linear1 = torch.nn.Linear(wi, wo)
        self.linear1.weight = torch.nn.Parameter(W.t())
        self.linear1.bias = torch.nn.Parameter(B)
    def forward(self, x):
        return self.linear1(x)
net = Net1Linear(3,2,W,B)
Z = net(X)
S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)

推荐答案

首先,您只计算张量的梯度,即可通过将requires_grad设置为True来启用梯度.

First of all you only calculate gradients for tensors where you enable the gradient by setting the requires_grad to True.

因此您的输出与预期的一样.您会得到X的渐变.

So your output is just as one would expect. You get the gradient for X.

出于性能方面的考虑,PyTorch不保存中间结果的梯度.因此,您只需获得将requires_grad设置为True的那些张量的梯度即可.

PyTorch does not save gradients of intermediate results for performance reasons. So you will just get the gradient for those tensors you set requires_grad to True.

但是,您可以在计算过程中使用register_hook提取中间等级或手动保存.在这里,我只是将其保存到张量Zgrad变量中:

However you can use register_hook to extract the intermediate grad during calculation or to save it manually. Here I just save it to the grad variable of tensor Z:

import torch

# function to extract grad
def set_grad(var):
    def hook(grad):
        var.grad = grad
    return hook

X = torch.tensor([[0.5, 0.3, 2.1], [0.2, 0.1, 1.1]], requires_grad=True)
W = torch.tensor([[2.1, 1.5], [-1.4, 0.5], [0.2, 1.1]])
B = torch.tensor([1.1, -0.3])
Z = torch.nn.functional.linear(X, weight=W.t(), bias=B)

# register_hook for Z
Z.register_hook(set_grad(Z))

S = torch.sum(Z)
S.backward()
print("Z:\n", Z)
print("gZ:\n", Z.grad)
print("gX:\n", X.grad)

这将输出:

Z:
 tensor([[2.1500, 2.9100],
        [1.6000, 1.2600]], grad_fn=<ThAddmmBackward>)
gZ:
 tensor([[1., 1.],
        [1., 1.]])
gX:
 tensor([[ 3.6000, -0.9000,  1.3000],
        [ 3.6000, -0.9000,  1.3000]])

希望这会有所帮助!

顺便说一句:通常,您希望为参数激活渐变-因此权重和偏差.因为使用优化器时您现在要做的是更改输入X而不是权重W和偏差B.因此,通常在这种情况下为WB激活梯度.

Btw.: Normally you would want the gradient to be activated for your parameters - so your weights and biases. Because what you would do right now when using the optimizer, is altering your inputs X and not your weights W and bias B. So usually gradient is activated for W and B in such a case.

这篇关于Pytorch-获取中间变量/张量的梯度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆