如何在pytorch中手动应用渐变 [英] how to apply gradients manually in pytorch

查看:124
本文介绍了如何在pytorch中手动应用渐变的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

开始学习pytorch,并尝试做一些非常简单的事情,尝试将大小为5的随机初始化向量移动到值为[1,2,3,4,5]的目标向量.

Starting to learn pytorch and was trying to do something very simple, trying to move a randomly initialized vector of size 5 to a target vector of value [1,2,3,4,5].

但是我的距离并没有减少!而我的向量x却发疯了.不知道我在想什么.

But my distance is not decreasing!! And my vector x just goes crazy. No idea what I am missing.

import torch
import numpy as np
from torch.autograd import Variable

# regress a vector to the goal vector [1,2,3,4,5]

dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

x = Variable(torch.rand(5).type(dtype), requires_grad=True)
target = Variable(torch.FloatTensor([1,2,3,4,5]).type(dtype), 
requires_grad=False)
distance = torch.mean(torch.pow((x - target), 2))

for i in range(100):
  distance.backward(retain_graph=True)
  x_grad = x.grad
  x.data.sub_(x_grad.data * 0.01)

推荐答案

您的代码中存在两个错误,它们使您无法获得所需的结果.

There are two errors in your code that prevents you from getting the desired results.

第一个错误是您应该将距离计算放入循环中.因为在这种情况下,距离是损失.因此,我们必须在每次迭代中监视其变化.

The first error is that you should put the distance calculation in the loop. Because the distance is the loss in this case. So we have to monitor its change in each iteration.

第二个错误是您应该手动将x.grad归零,因为

The second error is that you should manually zero out the x.grad because pytorch won't zero out the grad in variable by default.

以下是示例代码,可以正常工作:

The following is an example code which works as expected:

import torch
import numpy as np
from torch.autograd import Variable
import matplotlib.pyplot as plt

# regress a vector to the goal vector [1,2,3,4,5]

dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

x = Variable(torch.rand(5).type(dtype), requires_grad=True)
target = Variable(torch.FloatTensor([1,2,3,4,5]).type(dtype), 
requires_grad=False)

lr = 0.01 # the learning rate

d = []
for i in range(1000):
  distance = torch.mean(torch.pow((x - target), 2))
  d.append(distance.data)
  distance.backward(retain_graph=True)

  x.data.sub_(lr * x.grad.data)
  x.grad.data.zero_()

print(x.data)

fig, ax = plt.subplots()
ax.plot(d)
ax.set_xlabel("iteration")
ax.set_ylabel("distance")
plt.show()

以下是w.r.t迭代距离的图形

The following is the graph of distance w.r.t iteration

我们可以看到模型在大约600次迭代中收敛.如果我们将学习率设置为较高(例如lr = 0.1),则该模型的收敛速度会更快(大约需要60次迭代,请参见下图)

We can see that the model converges at about 600 iterations. If we set the learning rate to be higher (e.g, lr=0.1), the model will converge much faster (it takes about 60 iterations, see image below)

现在,x变成如下内容

0.9878 1.9749 2.9624 3.9429 4.9292

0.9878 1.9749 2.9624 3.9429 4.9292

非常接近您的目标[1,2,3,4,5].

which is pretty close to your target of [1, 2, 3, 4, 5].

这篇关于如何在pytorch中手动应用渐变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆