为什么'loss.backward()'和'weight.grad'返回一个包含全零的张量? [英] Why do 'loss.backward()' and 'weight.grad' return a tensor containing all zeros?

查看:107
本文介绍了为什么'loss.backward()'和'weight.grad'返回一个包含全零的张量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我运行'loss.backward()'和'weight.grad'时,我得到一个包含全零的张量.此外,"weight.grad_fn"将重新运行为NONE.

When I run 'loss.backward()' and 'weight.grad' I get a tensor containing all zeros. Also, 'weight.grad_fn' retruns NONE.

但是,所有这些似乎都为第二层"w2"返回了正确的结果.如果我使用x * 2或x ** 2'backward()'和'.grad'等简单操作返回正确的结果

However, it all seems to return the correct result for the second layer 'w2'. If I play with simple operations such as x*2 or x**2 'backward()' and '.grad' return correct results

这是我的代码:

import torch
from torch import nn
import torch.nn.functional as F
from torchvision import datasets, transforms

# Getting MNIST data
num_workers = 0
batch_size = 64
transform = transforms.ToTensor()
train_data = datasets.MNIST(root='data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
dataiter = iter(train_loader)
images, labels = dataiter.next()

#####################################
#####################################
#### NN Part

def activation(x):
    return 1/(1+torch.exp(-x))

inputs = torch.from_numpy(images.view())
# Flatten the inputs format from (64,1,28,28) into (64,784)
inputs = inputs.reshape(images.shape[0], int(images.shape[1]*images.shape[2]*images.shape[3]))


w1 = torch.randn(784, 256, requires_grad=True)# n_input, n_hidden
b1 = torch.randn(256)# n_hidden

w2 = torch.randn(256, 10, requires_grad=True)# n_hidden, n_output
b2 = torch.randn(10)# n_output

h = activation(torch.mm(inputs, w1) + b1)
y = torch.mm(h, w2) + b2

#print(h)
#print(y)

y.sum().backward()
print(w1.grad)
print(w1.grad_fn)
#print(w2.grad)
#print(w2.grad_fn)

如果我也尝试以这种方式运行它,它将给我带来同样的问题:

By the way it gives me the same problem if I try to run it this way also:

images = images.reshape(images.shape[0], -1)

model = nn.Sequential(nn.Linear(784, 128),
                      nn.ReLU(),
                      nn.Linear(128, 64),
                      nn.ReLU(),
                      nn.Linear(64, 10),
                      nn.LogSoftmax(dim=1))

logits = model(images)
criterion = nn.NLLLoss()

loss = criterion(logits, labels)
print(loss)
print(loss.grad_fn)


print('Before backward pass: ', model[0].weight.grad)
loss.backward()
print('After: ', model[0].weight.grad)
#print('After: ', model[2].weight.grad)
#print('After: ', model[4].weight.grad)

推荐答案

w1 的梯度并非全部为零,因为MNIST图像,尤其是在边界附近,仅存在很多零.有很多黑色像素(零).与零相乘时,所得的梯度也为零.

The gradients of w1 are not all zero, there are simply a lot of zeros, especially around the border, because the MNIST images have a lot of black pixels (zeros). When multiplying with zero, the resulting gradients are also zero.

通过打印 w1.grad ,您只会看到很小一部分值(边界),而看不到非零值.

By printing w1.grad you only see a very small part of the values (borders), and you just can't see the non-zero values.

w1.grad
# => tensor([[0., 0., 0.,  ..., 0., 0., 0.],
#            [0., 0., 0.,  ..., 0., 0., 0.],
#            [0., 0., 0.,  ..., 0., 0., 0.],
#            ...,
#            [0., 0., 0.,  ..., 0., 0., 0.],
#            [0., 0., 0.,  ..., 0., 0., 0.],
#            [0., 0., 0.,  ..., 0., 0., 0.]])

# Indices of non-zero elements
w1.grad.nonzero()
# => tensor([[ 71,   0],
#            [ 71,   1],
#            [ 71,   2],
#            ...,
#            [746, 253],
#            [746, 254],
#            [746, 255]])

这篇关于为什么'loss.backward()'和'weight.grad'返回一个包含全零的张量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆