PyTorch-参数不变 [英] PyTorch - parameters not changing
问题描述
为了了解pytorch的工作原理,我试图对多元正态分布中的某些参数进行最大似然估计.但是,它似乎不适用于任何与协方差相关的参数.
In an effort to learn how pytorch works, I am trying to do maximum likelihood estimation of some of the parameters in a multivariate normal distribution. However it does not seem to work for any of the covariance related parameters.
所以我的问题是:为什么这段代码不起作用?
So my question is: why does this code not work?
import torch
def make_covariance_matrix(sigma, rho):
return torch.tensor([[sigma[0]**2, rho * torch.prod(sigma)],
[rho * torch.prod(sigma), sigma[1]**2]])
mu_true = torch.randn(2)
rho_true = torch.rand(1)
sigma_true = torch.exp(torch.rand(2))
cov_true = make_covariance_matrix(sigma_true, rho_true)
dist_true = torch.distributions.MultivariateNormal(mu_true, cov_true)
samples = dist_true.sample((1_000,))
mu = torch.zeros(2, requires_grad=True)
log_sigma = torch.zeros(2, requires_grad=True)
atanh_rho = torch.zeros(1, requires_grad=True)
lbfgs = torch.optim.LBFGS([mu, log_sigma, atanh_rho])
def closure():
lbfgs.zero_grad()
sigma = torch.exp(log_sigma)
rho = torch.tanh(atanh_rho)
cov = make_covariance_matrix(sigma, rho)
dist = torch.distributions.MultivariateNormal(mu, cov)
loss = -torch.mean(dist.log_prob(samples))
loss.backward()
return loss
lbfgs.step(closure)
print("mu: {}, mu_hat: {}".format(mu_true, mu))
print("sigma: {}, sigma_hat: {}".format(sigma_true, torch.exp(log_sigma)))
print("rho: {}, rho_hat: {}".format(rho_true, torch.tanh(atanh_rho)))
输出:
mu: tensor([0.4168, 0.1580]), mu_hat: tensor([0.4127, 0.1454], requires_grad=True)
sigma: tensor([1.1917, 1.7290]), sigma_hat: tensor([1., 1.], grad_fn=<ExpBackward>)
rho: tensor([0.3589]), rho_hat: tensor([0.], grad_fn=<TanhBackward>)
>>> torch.__version__
'1.0.0.dev20181127'
换句话说,为什么log_sigma
和atanh_rho
的估计值没有偏离其初始值?
In other words, why have the estimates of log_sigma
and atanh_rho
not moved from their initial value?
推荐答案
创建协方差矩阵的方法不是 backprob-able
The way you create your covariance matrix is not backprob-able:
def make_covariance_matrix(sigma, rho):
return torch.tensor([[sigma[0]**2, rho * torch.prod(sigma)],
[rho * torch.prod(sigma), sigma[1]**2]])
从(多个)张量创建新张量时,将仅保留输入张量的值.输入张量中的所有其他信息都将被剥夺,因此从该点开始将所有 graph-connection 都切入参数,因此反向传播无法通过.
When creating a new tensor from (multiple) tensors, only the values of your input tensors will be kept. All additional information from the input tensors is stripped away, thus all graph-connection to your parameters is cut from this point, therefore backpropagation cannot get through.
以下是一个简短的示例来说明这一点:
import torch
param1 = torch.rand(1, requires_grad=True)
param2 = torch.rand(1, requires_grad=True)
tensor_from_params = torch.tensor([param1, param2])
print('Original parameter 1:')
print(param1, param1.requires_grad)
print('Original parameter 2:')
print(param2, param2.requires_grad)
print('New tensor form params:')
print(tensor_from_params, tensor_from_params.requires_grad)
输出:
Original parameter 1:
tensor([ 0.8913]) True
Original parameter 2:
tensor([ 0.4785]) True
New tensor form params:
tensor([ 0.8913, 0.4785]) False
如您所见,由参数param1
和param2
创建的张量无法跟踪param1
和param2
的梯度.
As you can see the tensor, created from the parameters param1
and param2
, does not keep track of the gradients of param1
and param2
.
因此,您可以使用此代码来保持图形连接并且可以 backprob-able :
def make_covariance_matrix(sigma, rho):
conv = torch.cat([(sigma[0]**2).view(-1), rho * torch.prod(sigma), rho * torch.prod(sigma), (sigma[1]**2).view(-1)])
return conv.view(2, 2)
使用torch.cat
将值连接到平坦张量.然后使用view()
将其调整为正确的形状.
这会产生与函数中相同的矩阵输出,但会保持与参数log_sigma
和atanh_rho
的连接.
The values are concatenated to a flat tensor using torch.cat
. Then they are brought into right shape using view()
.
This results in the same matrix output as in your function, but it keeps the connection to your parameters log_sigma
and atanh_rho
.
这是在更改了make_covariance_matrix
的步骤之前和之后的输出.如您所见,现在您可以优化参数,并且值也会改变:
Here is an output before and after the step with the changed make_covariance_matrix
. As you can see, now you can optimize your parameters and the values do change:
Before:
mu: tensor([ 0.1191, 0.7215]), mu_hat: tensor([ 0., 0.])
sigma: tensor([ 1.4222, 1.0949]), sigma_hat: tensor([ 1., 1.])
rho: tensor([ 0.2558]), rho_hat: tensor([ 0.])
After:
mu: tensor([ 0.1191, 0.7215]), mu_hat: tensor([ 0.0712, 0.7781])
sigma: tensor([ 1.4222, 1.0949]), sigma_hat: tensor([ 1.4410, 1.0807])
rho: tensor([ 0.2558]), rho_hat: tensor([ 0.2235])
希望这会有所帮助!
这篇关于PyTorch-参数不变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!