y = x/sum(x, dim=0) 的反向传播,其中张量 x 的大小为 (H,W) [英] Back-Propagation of y = x / sum(x, dim=0) where size of tensor x is (H,W)

查看:30
本文介绍了y = x/sum(x, dim=0) 的反向传播,其中张量 x 的大小为 (H,W)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

第一季度.

我正在尝试使用 pytorch 制作我的自定义 autograd 函数.

但是我在使用 y = x/sum(x, dim=0) 进行分析反向传播时遇到了问题

其中张量 x 的大小是(高度,宽度)(x 是二维的).

这是我的代码

class MyFunc(torch.autograd.Function):@静态方法定义转发(ctx,输入):ctx.save_for_backward(输入)输入 = 输入/torch.sum(输入,dim=0)返回输入@静态方法def向后(ctx,grad_output):输入 = ctx.saved_tensors[0]H, W = input.size()sum = torch.sum(输入,dim=0)grad_input = grad_output * (1/sum - input*1/sum**2)返回 grad_input

我使用 (torch.autograd import) gradcheck 来比较雅可比矩阵,

from torch.autograd import gradcheckfunc = MyFunc.apply输入 = (torch.randn(3,3,dtype=torch.double,requires_grad=True))测试 = gradcheck(功能,输入)

结果是

请有人帮我得到正确的反向传播结果

谢谢!


第二季度.

感谢您的回答!

由于您的帮助,我可以在 (H,W) 张量的情况下实现反向传播.

然而,当我在 (N,H,W) 张量的情况下实现反向传播时,我遇到了问题.我认为问题在于初始化新张量.

这是我的新代码

导入火炬将 torch.nn 导入为 nn导入 torch.nn.functional 作为 F类 MyFunc(torch.autograd.Function):@静态方法定义转发(ctx,输入):ctx.save_for_backward(输入)N = input.size(0)对于范围内的 n(N):输入[n]/= torch.sum(输入[n],dim=0)返回输入@静态方法def向后(ctx,grad_output):输入 = ctx.saved_tensors[0]N, H, W = input.size()I = torch.eye(H).unsqueeze(-1)总和 = input.sum(1)grad_input = torch.zeros((N,H,W), dtype = torch.double, requires_grad=True)对于范围内的 n(N):grad_input[n] = ((sum[n] * I - input[n]) * grad_output[n]/sum[n]**2).sum(1)返回 grad_input

Gradcheck 代码是

from torch.autograd import gradcheckfunc = MyFunc.apply输入 = (torch.rand(2,2,2,dtype=torch.double,requires_grad=True))测试 = gradcheck(功能,输入)打印(测试)

结果是在此处输入图片描述

我不知道为什么会出现错误...

您的帮助对我实现自己的卷积网络非常有帮助.

谢谢!祝你有美好的一天.

解决方案

我们来看一个单列的例子,例如:[[x1], [x2], [x3]].

sumx1 + x2 + x3,然后标准化 x 将得到 y = [[y1], [y2], [y3]] = [[x1/sum], [x2/sum], [x3/sum]].您正在寻找 dL/dx1dL/x2dL/x3 - 我们将它们写成:dx1dx2dx3.所有 dL/dyi 都一样.

所以 dx1 等于 dL/dy1*dy1/dx1 + dL/dy2*dy2/dx1 + dL/dy3*dy3/dx1.这是因为 x1 对相应列上的所有输出元素都有贡献:y1y2y3.>

我们有:

  • dy1/dx1 = d(x1/sum)/dx1 = (sum - x1)/sum²

  • dy2/dx1 = d(x2/sum)/dx1 = -x2/sum²

  • 同理,dy3/dx1 = d(x3/sum)/dx1 = -x3/sum²

因此dx1 = (sum - x1)/sum²*dy1 - x2/sum²*dy2 - x3/sum²*dy3.dx2dx3 相同.结果,雅可比行列式是 [dxi]_i = (sum - xi)/sum²[dxi]_j = -xj/sum²(对于所有 j 不同于 i).

在您的实现中,您似乎缺少所有非对角线组件.

保持相同的单列示例,使用 x1=2x2=3x3=5:

<预><代码>>>>x = torch.tensor([[2.], [3.], [5.]])>>>总和 = input.sum(0)张量([10])

雅可比行列式将是:

<预><代码>>>>J = (sum*torch.eye(input.size(0)) - input)/sum**2张量([[ 0.0800, -0.0200, -0.0200],[-0.0300, 0.0700, -0.0300],[-0.0500, -0.0500, 0.0500]])


对于多列的实现,它有点棘手,更具体地说是对角矩阵的形状.将 column 轴保持在最后更容易,这样我们就不必为广播而烦恼了:

<预><代码>>>>x = torch.tensor([[2., 1], [3., 3], [5., 5]])>>>总和 = x.sum(0)张量([10., 9.])>>>diag = sum*torch.eye(3).unsqueeze(-1).repeat(1, 1, len(sum))张量([[[10., 9.],[0., 0.],[0., 0.]],[[0., 0.],[10., 9.],[0., 0.]],[[0., 0.],[0., 0.],[10., 9.]]])

上面的 diag 具有 (3, 3, 2) 的形状,其中两列 位于最后一个轴上.注意我们不需要广播 sum.

不会做的是:torch.eye(3).unsqueeze(0).repeat(len(sum), 1, 1).由于使用这种形状 - (2, 3, 3) - 您将不得不使用 sum[:, None, None],并且需要进一步向下广播路...

雅可比矩阵很简单:

<预><代码>>>>J = (diag - x)/sum**2张量([[[ 0.0800, 0.0988],[-0.0300, -0.0370],[-0.0500, -0.0617]],[[-0.0200, -0.0123],[0.0700, 0.0741],[-0.0500, -0.0617]],[[-0.0200, -0.0123],[-0.0300, -0.0370],[ 0.0500, 0.0494]]])

您可以通过使用任意 dy 向量(但不使用 torch.ones,您将获得 0code>s 因为 J!).反向传播后,x.grad 应该等于 torch.einsum('abc,bc->ac', J, dy).

Q1.

I'm trying to make my custom autograd function with pytorch.

But I had a problem with making analytical back propagation with y = x / sum(x, dim=0)

where size of tensor x is (Height, Width) (x is 2-dimensional).

Here's my code

class MyFunc(torch.autograd.Function):
@staticmethod
def forward(ctx, input):
  ctx.save_for_backward(input)
  input = input / torch.sum(input, dim=0)

  return input

@staticmethod
def backward(ctx, grad_output):
  input = ctx.saved_tensors[0]
  H, W = input.size()
  sum = torch.sum(input, dim=0)
  grad_input = grad_output * (1/sum - input*1/sum**2)

  return grad_input

I used (torch.autograd import) gradcheck to compare Jacobian matrix,

from torch.autograd import gradcheck
func = MyFunc.apply
input = (torch.randn(3,3,dtype=torch.double,requires_grad=True))
test = gradcheck(func, input)

and the result was

Please someone help me to get correct back propagation result

Thanks!


Q2.

Thanks for answers!

Because of your help, I could implement back propagation in case of (H,W) tensor.

However, while I implemented back propagation in case of (N,H,W) tensor, I got a problem. I think the problem would be initializing new tensor.

Here's my new code

import torch
import torch.nn as nn
import torch.nn.functional as F

class MyFunc(torch.autograd.Function):
  @staticmethod
  def forward(ctx, input):
    ctx.save_for_backward(input)
    
    N = input.size(0)
    for n in range(N):
      input[n] /= torch.sum(input[n], dim=0)

    return input

  @staticmethod
  def backward(ctx, grad_output):
    input = ctx.saved_tensors[0]
    N, H, W = input.size()
    I = torch.eye(H).unsqueeze(-1)
    sum = input.sum(1)

    grad_input = torch.zeros((N,H,W), dtype = torch.double, requires_grad=True)
    for n in range(N):
      grad_input[n] = ((sum[n] * I - input[n]) * grad_output[n] / sum[n]**2).sum(1)

    return grad_input

Gradcheck code is

from torch.autograd import gradcheck
func = MyFunc.apply
input = (torch.rand(2,2,2,dtype=torch.double,requires_grad=True))
test = gradcheck(func, input)
print(test)

and result is enter image description here

I don't know why the error occurs...

Your help will be very helpful for me to implement my own convolutional network.

Thanks! Have a nice day.

解决方案

Let's look an example with a single column, for instance: [[x1], [x2], [x3]].

Let sum be x1 + x2 + x3, then normalizing x will give y = [[y1], [y2], [y3]] = [[x1/sum], [x2/sum], [x3/sum]]. You're looking for dL/dx1, dL/x2, and dL/x3 - we'll just write them as: dx1, dx2, and dx3. Same for all dL/dyi.

So dx1 is equal to dL/dy1*dy1/dx1 + dL/dy2*dy2/dx1 + dL/dy3*dy3/dx1. That's because x1 contributes to all ouput element on the corresponding column: y1, y2, and y3.

We have:

  • dy1/dx1 = d(x1/sum)/dx1 = (sum - x1)/sum²

  • dy2/dx1 = d(x2/sum)/dx1 = -x2/sum²

  • similarly, dy3/dx1 = d(x3/sum)/dx1 = -x3/sum²

Therefore dx1 = (sum - x1)/sum²*dy1 - x2/sum²*dy2 - x3/sum²*dy3. Same for dx2 and dx3. As a result, the Jacobian is [dxi]_i = (sum - xi)/sum² and [dxi]_j = -xj/sum² (for all j different to i).

In your implementation, you seem to be missing all non-diagonal components.

Keeping the same one-column example, with x1=2, x2=3, and x3=5:

>>> x = torch.tensor([[2.], [3.], [5.]])

>>> sum = input.sum(0)
tensor([10])

The Jacobian will be:

>>> J = (sum*torch.eye(input.size(0)) - input)/sum**2
tensor([[ 0.0800, -0.0200, -0.0200],
        [-0.0300,  0.0700, -0.0300],
        [-0.0500, -0.0500,  0.0500]])


For an implementation with multiple columns, it's a bit trickier, more specifically for the shape of the diagonal matrix. It's easier to keep the column axis last so we don't have to bother with broadcastings:

>>> x = torch.tensor([[2., 1], [3., 3], [5., 5]])
>>> sum = x.sum(0)
tensor([10.,  9.])

>>> diag = sum*torch.eye(3).unsqueeze(-1).repeat(1, 1, len(sum))
tensor([[[10.,  9.],
         [ 0.,  0.],
         [ 0.,  0.]],

        [[ 0.,  0.],
         [10.,  9.],
         [ 0.,  0.]],

        [[ 0.,  0.],
         [ 0.,  0.],
         [10.,  9.]]])

Above diag has a shape of (3, 3, 2) where the two columns are on the last axis. Notice how we didn't need to broadcast sum.

What I wouldn't have done is: torch.eye(3).unsqueeze(0).repeat(len(sum), 1, 1). Since with this kind of shape - (2, 3, 3) - you will have to use sum[:, None, None], and will need further broadcasting down the road...

The Jacobian is simply:

>>> J = (diag - x)/sum**2
tensor([[[ 0.0800,  0.0988],
         [-0.0300, -0.0370],
         [-0.0500, -0.0617]],

        [[-0.0200, -0.0123],
         [ 0.0700,  0.0741],
         [-0.0500, -0.0617]],

        [[-0.0200, -0.0123],
         [-0.0300, -0.0370],
         [ 0.0500,  0.0494]]])

You can check the results by backpropagating through the operation using an arbitrary dy vector (not with torch.ones though, you'll get 0s because of J!). After backpropagating, x.grad should equal to torch.einsum('abc,bc->ac', J, dy).

这篇关于y = x/sum(x, dim=0) 的反向传播,其中张量 x 的大小为 (H,W)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆