如何通过张量有效地使用 PyTorch 的 autograd? [英] How to use PyTorch's autograd efficiently with tensors?

查看:49
本文介绍了如何通过张量有效地使用 PyTorch 的 autograd?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我之前的问题中,我发现了如何使用 PyTorch 的 autograd来区分.它奏效了:

In my previous question I found how to use PyTorch's autograd to differentiate. And it worked:

#autograd
import torch
from torch.autograd import grad
import torch.nn as nn
import torch.optim as optim

class net_x(nn.Module): 
        def __init__(self):
            super(net_x, self).__init__()
            self.fc1=nn.Linear(1, 20) 
            self.fc2=nn.Linear(20, 20)
            self.out=nn.Linear(20, 4) 

        def forward(self, x):
            x=torch.tanh(self.fc1(x))
            x=torch.tanh(self.fc2(x))
            x=self.out(x)
            return x

nx = net_x()
r = torch.tensor([1.0], requires_grad=True)
print('r', r)
y = nx(r)
print('y', y)
print('')
for i in range(y.shape[0]):
    # prints the vector (dy_i/dr_0, dy_i/dr_1, ... dy_i/dr_n)
    print(grad(y[i], r, retain_graph=True))

>>>
r tensor([1.], requires_grad=True)
y tensor([ 0.1698, -0.1871, -0.1313, -0.2747], grad_fn=<AddBackward0>)

(tensor([-0.0124]),)
(tensor([-0.0952]),)
(tensor([-0.0433]),)
(tensor([-0.0099]),)

我目前遇到的问题是我必须像我目前所做的那样区分一个非常大的张量并对其进行迭代(for i in range(y.shape[0]))是永远.我迭代的原因是,根据理解,grad 只知道如何从标量张量传播梯度,而 y 不是.所以我需要计算 y 的每个坐标的梯度.
我知道 TensorFlow 能够区分张量,来自 这里:

The problem that I currently have is that I have to differentiate a very large tensor and iterating through it like I'm currently doing (for i in range(y.shape[0])) is taking forever. The reason I'm iterating is that from understanding, grad only knows how to propagate gradients from a scalar tensor, which y is not. So I need to compute the gradients with respect to each coordinate of y.
I know that TensorFlow is capable of differentiating tensors, from here:

tf.gradients(
    ys, xs, grad_ys=None, name='gradients', gate_gradients=False,
    aggregation_method=None, stop_gradients=None,
    unconnected_gradients=tf.UnconnectedGradients.NONE
)
"ys and xs are each a Tensor or a list of tensors. grad_ys is a list of Tensor, holding the gradients received by the ys. The list must be the same length as ys.

gradients() adds ops to the graph to output the derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys and for x in xs."

并且希望有一种更有效的方法来区分 PyTorch 中的张量.

And was hoping that there's a more efficient way to differentiate tensors in PyTorch.

例如:

a = range(100)
b = range(100)
c = range(100)
d = range(100)
my_tensor = torch.tensor([a,b,c,d])

t = range(100)

#derivative = grad(my_tensor, t) --> not working

#Instead what I'm currently doing:
for i in range(len(t)):
    a_grad = grad(a[i],t[i], retain_graph=True)
    b_grad = grad(b[i],t[i], retain_graph=True)
    #etc.

有人告诉我,如果我可以在正向传递而不是向后传递上运行 autograd,它可能会起作用,但是来自 此处 似乎目前 PyTorch 没有此功能.

I was told that it might work if I could run autograd on the forward pass rather than the backwards pass, but from here it seems like it's not currently a feature PyTorch has.

更新 1:
@jodag 提到我正在寻找的可能只是雅可比的对角线.我正在关注他附加的 link 并尝试了更快的方法.虽然,这似乎不起作用,并给了我一个错误:RuntimeError: grad 只能为标量输出隐式创建.代码:

Update 1:
@jodag mentioned that what I'm looking for might be just the diagonal of the Jacobian. I'm following the link he attached and trying out the faster method. Though, this doesn't seem to work and gives me an error: RuntimeError: grad can be implicitly created only for scalar outputs. Code:

nx = net_x()
x = torch.rand(10, requires_grad=True)
x = torch.reshape(x, (10,1))
x = x.unsqueeze(1).repeat(1, 4, 1)
y = nx(x)
dx = torch.diagonal(torch.autograd.grad(torch.diagonal(y, 0, -2, -1), x), 0, -2, -1)

推荐答案

我相信我使用@jodag 建议解决了这个问题——简单地计算雅可比矩阵并取对角线.
考虑以下网络:

I believe I solved it using @ jodag advice -- to simply calculate the Jacobian and take the diagonal.
Consider the following network:

import torch
from torch.autograd import grad
import torch.nn as nn
import torch.optim as optim

class net_x(nn.Module): 
        def __init__(self):
            super(net_x, self).__init__()
            self.fc1=nn.Linear(1, 20) 
            self.fc2=nn.Linear(20, 20)
            self.out=nn.Linear(20, 4) #a,b,c,d

        def forward(self, x):
            x=torch.tanh(self.fc1(x))
            x=torch.tanh(self.fc2(x))
            x=self.out(x)
            return x

nx = net_x()

#input
t = torch.tensor([1.0, 2.0, 3.2], requires_grad = True) #input vector
t = torch.reshape(t, (3,1)) #reshape for batch

到目前为止,我的方法是遍历输入,因为 grad 需要一个如上所述的标量值:

My approach so far was to iterate through the input since grad wants a scalar value as mentioned above:

#method 1
for timestep in t:
    y = nx(timestep) 
    print(grad(y[0],timestep, retain_graph=True)) #0 for the first vector (i.e "a"), 1 for the 2nd vector (i.e "b")

>>>
(tensor([-0.0142]),)
(tensor([-0.0517]),)
(tensor([-0.0634]),)

使用雅可比矩阵的对角线似乎更有效并给出相同的结果:

Using the diagonal of the Jacobian seems more efficient and gives the same results:

#method 2
dx = torch.autograd.functional.jacobian(lambda t_: nx(t_), t)
dx = torch.diagonal(torch.diagonal(dx, 0, -1), 0)[0] #first vector
#dx = torch.diagonal(torch.diagonal(dx, 1, -1), 0)[0] #2nd vector
#dx = torch.diagonal(torch.diagonal(dx, 2, -1), 0)[0] #3rd vector
#dx = torch.diagonal(torch.diagonal(dx, 3, -1), 0)[0] #4th vector
dx

>>>
tensor([-0.0142, -0.0517, -0.0634])

这篇关于如何通过张量有效地使用 PyTorch 的 autograd?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆