pytorch,梯度参数是什么 [英] Pytorch, what are the gradient arguments

查看:187
本文介绍了pytorch,梯度参数是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读PyTorch的文档,并找到了他们编写的示例

gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)
print(x.grad)

其中x是初始变量,从中构造y(3矢量).问题是,梯度张量的0.1、1.0和0.0001参数是什么?该文档对此不太清楚.

解决方案

我在PyTorch网站上找不到的原始代码.

gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)
print(x.grad)

上面的代码存在问题,没有基于计算梯度的函数.这意味着我们不知道有多少个参数(函数采用的参数)以及参数的维数.

为了完全理解这一点,我创建了几个与原始示例很接近的示例:

示例1:

a = torch.tensor([1.0, 2.0, 3.0], requires_grad = True)
b = torch.tensor([3.0, 4.0, 5.0], requires_grad = True)
c = torch.tensor([6.0, 7.0, 8.0], requires_grad = True)

y=3*a + 2*b*b + torch.log(c)    
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients,retain_graph=True)    

print(a.grad) # tensor([3.0000e-01, 3.0000e+00, 3.0000e-04])
print(b.grad) # tensor([1.2000e+00, 1.6000e+01, 2.0000e-03])
print(c.grad) # tensor([1.6667e-02, 1.4286e-01, 1.2500e-05])

如您所见,在第一个示例中我假设我们的函数是y=3*a + 2*b*b + torch.log(c),并且参数是其中具有三个元素的张量.

但是还有另一种选择:

示例2:

import torch

a = torch.tensor(1.0, requires_grad = True)
b = torch.tensor(1.0, requires_grad = True)
c = torch.tensor(1.0, requires_grad = True)

y=3*a + 2*b*b + torch.log(c)    
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)

print(a.grad) # tensor(3.3003)
print(b.grad) # tensor(4.4004)
print(c.grad) # tensor(1.1001)

gradients = torch.FloatTensor([0.1, 1.0, 0.0001])是累加器.

下一个示例将提供相同的结果.

示例3:

a = torch.tensor(1.0, requires_grad = True)
b = torch.tensor(1.0, requires_grad = True)
c = torch.tensor(1.0, requires_grad = True)

y=3*a + 2*b*b + torch.log(c)

gradients = torch.FloatTensor([0.1])
y.backward(gradients,retain_graph=True)
gradients = torch.FloatTensor([1.0])
y.backward(gradients,retain_graph=True)
gradients = torch.FloatTensor([0.0001])
y.backward(gradients)

print(a.grad) # tensor(3.3003)
print(b.grad) # tensor(4.4004)
print(c.grad) # tensor(1.1001)

您可能会听说PyTorch autograd系统的计算与Jacobian乘积等效.

如果您有像我们一样的功能,

y=3*a + 2*b*b + torch.log(c)

Jacobian将是[3, 4*b, 1/c].但是,这个 Jacobian 并不是PyTorch如何在特定点上计算梯度的方式.

对于先前的函数,PyTorch会针对δy/δb,对于b=1b=1+ε(其中ε小)执行操作.因此,没有像符号数学这样的事情.

如果您不使用y.backward()中的渐变:

示例4

a = torch.tensor(0.1, requires_grad = True)
b = torch.tensor(1.0, requires_grad = True)
c = torch.tensor(0.1, requires_grad = True)
y=3*a + 2*b*b + torch.log(c)

y.backward()

print(a.grad) # tensor(3.)
print(b.grad) # tensor(4.)
print(c.grad) # tensor(10.)

根据最初设置abc张量的方式,您可以简单地在某个时刻获得结果.

请注意如何初始化abc:

示例5:

a = torch.empty(1, requires_grad = True, pin_memory=True)
b = torch.empty(1, requires_grad = True, pin_memory=True)
c = torch.empty(1, requires_grad = True, pin_memory=True)

y=3*a + 2*b*b + torch.log(c)

gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)

print(a.grad) # tensor([3.3003])
print(b.grad) # tensor([0.])
print(c.grad) # tensor([inf])

如果使用torch.empty()而不使用pin_memory=True,则每次可能会有不同的结果.

另外,音符梯度就像累加器,因此在需要时将它们归零.

示例6:

a = torch.tensor(1.0, requires_grad = True)
b = torch.tensor(1.0, requires_grad = True)
c = torch.tensor(1.0, requires_grad = True)
y=3*a + 2*b*b + torch.log(c)

y.backward(retain_graph=True)
y.backward()

print(a.grad) # tensor(6.)
print(b.grad) # tensor(8.)
print(c.grad) # tensor(2.)

最后我只想说明PyTorch使用的一些术语:

PyTorch在计算梯度时会创建一个动态计算图.这看起来很像一棵树.

所以您经常会听到这棵树的输入张量,而 root 输出张量

梯度是通过从根到叶跟踪图形并使用链规则的方式将每个梯度相乘来计算的.

I am reading through the documentation of PyTorch and found an example where they write

gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)
print(x.grad)

where x was an initial variable, from which y was constructed (a 3-vector). The question is, what are the 0.1, 1.0 and 0.0001 arguments of the gradients tensor ? The documentation is not very clear on that.

解决方案

The original code I haven't found on PyTorch website anymore.

gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)
print(x.grad)

The problem with the code above there is no function based on what to calculate the gradients. This means we don't know how many parameters (arguments the function takes) and the dimension of parameters.

To fully understand this I created several examples close to the original:

Example 1:

a = torch.tensor([1.0, 2.0, 3.0], requires_grad = True)
b = torch.tensor([3.0, 4.0, 5.0], requires_grad = True)
c = torch.tensor([6.0, 7.0, 8.0], requires_grad = True)

y=3*a + 2*b*b + torch.log(c)    
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients,retain_graph=True)    

print(a.grad) # tensor([3.0000e-01, 3.0000e+00, 3.0000e-04])
print(b.grad) # tensor([1.2000e+00, 1.6000e+01, 2.0000e-03])
print(c.grad) # tensor([1.6667e-02, 1.4286e-01, 1.2500e-05])

As you can see I assumed in the first example our function is y=3*a + 2*b*b + torch.log(c) and the parameters are tensors with three elements inside.

But there is another option:

Example 2:

import torch

a = torch.tensor(1.0, requires_grad = True)
b = torch.tensor(1.0, requires_grad = True)
c = torch.tensor(1.0, requires_grad = True)

y=3*a + 2*b*b + torch.log(c)    
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)

print(a.grad) # tensor(3.3003)
print(b.grad) # tensor(4.4004)
print(c.grad) # tensor(1.1001)

The gradients = torch.FloatTensor([0.1, 1.0, 0.0001]) is the accumulator.

The next example would provide identical results.

Example 3:

a = torch.tensor(1.0, requires_grad = True)
b = torch.tensor(1.0, requires_grad = True)
c = torch.tensor(1.0, requires_grad = True)

y=3*a + 2*b*b + torch.log(c)

gradients = torch.FloatTensor([0.1])
y.backward(gradients,retain_graph=True)
gradients = torch.FloatTensor([1.0])
y.backward(gradients,retain_graph=True)
gradients = torch.FloatTensor([0.0001])
y.backward(gradients)

print(a.grad) # tensor(3.3003)
print(b.grad) # tensor(4.4004)
print(c.grad) # tensor(1.1001)

As you may hear PyTorch autograd system calculation is equivalent to Jacobian product.

In case you have a function, like we did:

y=3*a + 2*b*b + torch.log(c)

Jacobian would be [3, 4*b, 1/c]. However, this Jacobian is not how PyTorch is doing things to calculate the gradients at certain point.

For the previous function PyTorch would do for instance δy/δb, for b=1 and b=1+ε where ε is small. So there is nothing like symbolic math involved.

If you don't use gradients in y.backward():

Example 4

a = torch.tensor(0.1, requires_grad = True)
b = torch.tensor(1.0, requires_grad = True)
c = torch.tensor(0.1, requires_grad = True)
y=3*a + 2*b*b + torch.log(c)

y.backward()

print(a.grad) # tensor(3.)
print(b.grad) # tensor(4.)
print(c.grad) # tensor(10.)

You will simple get the result at a point, based on how you set your a, b, c tensors initially.

Be careful how you initialize your a, b, c:

Example 5:

a = torch.empty(1, requires_grad = True, pin_memory=True)
b = torch.empty(1, requires_grad = True, pin_memory=True)
c = torch.empty(1, requires_grad = True, pin_memory=True)

y=3*a + 2*b*b + torch.log(c)

gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)

print(a.grad) # tensor([3.3003])
print(b.grad) # tensor([0.])
print(c.grad) # tensor([inf])

If you use torch.empty() and don't use pin_memory=True you may have different results every time.

Also, note gradients are like accumulators so zero them when needed.

Example 6:

a = torch.tensor(1.0, requires_grad = True)
b = torch.tensor(1.0, requires_grad = True)
c = torch.tensor(1.0, requires_grad = True)
y=3*a + 2*b*b + torch.log(c)

y.backward(retain_graph=True)
y.backward()

print(a.grad) # tensor(6.)
print(b.grad) # tensor(8.)
print(c.grad) # tensor(2.)

Lastly I just wanted to state some terms PyTorch uses:

PyTorch creates a dynamic computational graph when calculating the gradients. This looks much like a tree.

So you will often hear the leaves of this tree are input tensors and the root is output tensor.

Gradients are calculated by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule.

这篇关于pytorch,梯度参数是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆