PyTorch 中的反向函数 [英] Backward function in PyTorch

查看：37 发布时间：2021/12/14 9:38:39 machine-learning pytorch gradient-descent autograd

本文介绍了PyTorch 中的反向函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对 pytorch 的向后功能有一些疑问，我认为我没有得到正确的输出:

将 numpy 导入为 np进口火炬从 torch.autograd 导入变量a = Variable(torch.FloatTensor([[1,2,3],[4,5,6]]), requires_grad=True)输出 = a * aout.backward(a)打印(a.grad)

输出是

tensor([[ 2., 8., 18.],[32., 50., 72.]])

也许是 2*a*a

但我认为输出应该是

tensor([[ 2., 4., 6.],[8., 10., 12.]])

2*a. 原因 d(x^2)/dx=2x

解决方案

请仔细阅读 backward() 以更好地理解它.

默认情况下，pytorch 期望 backward() 为网络的最后输出 - 损失函数被调用.损失函数总是输出一个标量，因此，scalar 损失相对于所有其他变量/参数的梯度是明确定义的(使用链式法则).

因此，默认情况下，backward() 在标量张量上被调用并且不需要参数.

例如:

a = torch.tensor([[1,2,3],[4,5,6]], dtype=torch.float, requires_grad=True)对于范围内的我(2):对于范围内的 j(3):out = a[i,j] * a[i,j]out.backward()打印(a.grad)

收益

<块引用>

tensor([[ 2., 4., 6.],[8., 10., 12.]])

正如预期的那样:d(a^2)/da = 2a.

但是，当您在 2×3 out 张量(不再是标量函数)上调用 backward 时 - 您期望什么 a.grad 是?您实际上需要一个 2×3×2×3 输出: d out[i,j]/d a[k,l](!)

Pytorch 不支持这种非标量函数导数.相反，pytorch 假设 out 只是一个中间张量，并且在上游"的某个地方.有一个标量损失函数，通过链式法则提供d loss/d out[i,j].这个上游"梯度的大小为 2×3，这实际上是您在这种情况下提供的 backward 参数:out.backward(g) 其中 g_ij = d loss/d out_ij.

然后通过链式法则计算梯度 d loss/da[i,j] = (d loss/d out[i,j]) * (d out[i,j]/da[i,j])

由于您提供了 a 作为上游"你得到的渐变

a.grad[i,j] = 2 * a[i,j] * a[i,j]

如果您要提供上游"渐变是所有的

out.backward(torch.ones(2,3))打印(a.grad)

收益

<块引用>

tensor([[ 2., 4., 6.],[8., 10., 12.]])

正如预期的那样.

这一切都在链式法则中.

I have some question about pytorch's backward function I don't think I'm getting the right output :

import numpy as np
import torch
from torch.autograd import Variable
a = Variable(torch.FloatTensor([[1,2,3],[4,5,6]]), requires_grad=True) 
out = a * a
out.backward(a)
print(a.grad)

the output is

tensor([[ 2.,  8., 18.],
        [32., 50., 72.]])

maybe it's 2*a*a

but i think the output suppose to be

tensor([[ 2.,  4., 6.],
        [8., 10., 12.]])

2*a. cause d(x^2)/dx=2x

解决方案

Please read carefully the documentation on backward() to better understand it.

By default, pytorch expects backward() to be called for the last output of the network - the loss function. The loss function always outputs a scalar and therefore, the gradients of the scalar loss w.r.t all other variables/parameters is well defined (using the chain rule).

Thus, by default, backward() is called on a scalar tensor and expects no arguments.

For example:

a = torch.tensor([[1,2,3],[4,5,6]], dtype=torch.float, requires_grad=True)
for i in range(2):
  for j in range(3):
    out = a[i,j] * a[i,j]
    out.backward()
print(a.grad)

yields

tensor([[ 2.,  4.,  6.],
        [ 8., 10., 12.]])

As expected: d(a^2)/da = 2a.

However, when you call backward on the 2-by-3 out tensor (no longer a scalar function) - what do you expects a.grad to be? You'll actually need a 2-by-3-by-2-by-3 output: d out[i,j] / d a[k,l](!)

Pytorch does not support this non-scalar function derivatives. Instead, pytorch assumes out is only an intermediate tensor and somewhere "upstream" there is a scalar loss function, that through chain rule provides d loss/ d out[i,j]. This "upstream" gradient is of size 2-by-3 and this is actually the argument you provide backward in this case: out.backward(g) where g_ij = d loss/ d out_ij.

The gradients are then calculated by chain rule d loss / d a[i,j] = (d loss/d out[i,j]) * (d out[i,j] / d a[i,j])

Since you provided a as the "upstream" gradients you got

a.grad[i,j] = 2 * a[i,j] * a[i,j]

If you were to provide the "upstream" gradients to be all ones

out.backward(torch.ones(2,3))
print(a.grad)

yields

tensor([[ 2.,  4.,  6.],
        [ 8., 10., 12.]])

As expected.

It's all in the chain rule.

这篇关于PyTorch 中的反向函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PyTorch 中的反向函数 [英] Backward function in PyTorch

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

PyTorch 中的反向函数 [英] Backward function in PyTorch

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭