Pytorch-RuntimeError:尝试第二次向后浏览图形,但缓冲区已被释放 [英] Pytorch - RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed
问题描述
我一直遇到这个错误:
RuntimeError:尝试第二次向后浏览图形,但缓冲区已被释放.第一次回叫时,请指定keep_graph = True.
我在Pytorch论坛中进行了搜索,但仍然无法确定自定义损失函数中做错了什么.我的模型是nn.GRU,这是我的自定义损失函数:
def _loss(outputs, session, items): # `items` is a dict() contains embedding of all items
def f(output, target):
pos = torch.from_numpy(np.array([items[target["click"]]])).float()
neg = torch.from_numpy(np.array([items[idx] for idx in target["suggest_list"] if idx != target["click"]])).float()
if USE_CUDA:
pos, neg = pos.cuda(), neg.cuda()
pos, neg = Variable(pos), Variable(neg)
pos = F.cosine_similarity(output, pos)
if neg.size()[0] == 0:
return torch.mean(F.logsigmoid(pos))
neg = F.cosine_similarity(output.expand_as(neg), neg)
return torch.mean(F.logsigmoid(pos - neg))
loss = map(f, outputs, session)
return -torch.mean(torch.cat(loss))
培训代码:
# zero the parameter gradients
model.zero_grad()
# forward + backward + optimize
outputs, hidden = model(inputs, hidden)
loss = _loss(outputs, session, items)
acc_loss += loss.data[0]
loss.backward()
# Add parameters' gradients to their values, multiplied by learning rate
for p in model.parameters():
p.data.add_(-learning_rate, p.grad.data)
问题出在我的训练循环中:它不会在批次之间分离或重新包装隐藏状态?如果是这样,则loss.backward()
一直尝试向后传播直到时间开始,这对第一批有效,但对第二批无效,因为第一批的图形已被丢弃.
有两种可能的解决方案.
1)在批次之间分离/重新包装隐藏状态.有(在 最少)执行此操作的三种方法(我选择了此解决方案):
hidden.detach_()
hidden = hidden.detach()
2)用loss.backward(retain_graph=True)
替换loss.backward(),但是知道每个连续的批处理将比前一个批处理花费更多的时间,因为它将必须一直反向传播直到第一个批处理的开始. /p>
I keep running into this error:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
I had searched in Pytorch forum, but still can’t find out what I have done wrong in my custom loss function. My model is nn.GRU, and here is my custom loss function:
def _loss(outputs, session, items): # `items` is a dict() contains embedding of all items
def f(output, target):
pos = torch.from_numpy(np.array([items[target["click"]]])).float()
neg = torch.from_numpy(np.array([items[idx] for idx in target["suggest_list"] if idx != target["click"]])).float()
if USE_CUDA:
pos, neg = pos.cuda(), neg.cuda()
pos, neg = Variable(pos), Variable(neg)
pos = F.cosine_similarity(output, pos)
if neg.size()[0] == 0:
return torch.mean(F.logsigmoid(pos))
neg = F.cosine_similarity(output.expand_as(neg), neg)
return torch.mean(F.logsigmoid(pos - neg))
loss = map(f, outputs, session)
return -torch.mean(torch.cat(loss))
Training code:
# zero the parameter gradients
model.zero_grad()
# forward + backward + optimize
outputs, hidden = model(inputs, hidden)
loss = _loss(outputs, session, items)
acc_loss += loss.data[0]
loss.backward()
# Add parameters' gradients to their values, multiplied by learning rate
for p in model.parameters():
p.data.add_(-learning_rate, p.grad.data)
The problem is from my training loop: it doesn’t detach or repackage the hidden state in between batches? If so, then loss.backward()
is trying to back-propagate all the way through to the start of time, which works for the first batch but not for the second because the graph for the first batch has been discarded.
there are two possible solutions.
1) detach/repackage the hidden state in between batches. There are (at least) three ways to do this (and I chose this solution):
hidden.detach_()
hidden = hidden.detach()
2) replace loss.backward() with loss.backward(retain_graph=True)
but know that each successive batch will take more time than the previous one because it will have to back-propagate all the way through to the start of the first batch.
这篇关于Pytorch-RuntimeError:尝试第二次向后浏览图形,但缓冲区已被释放的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!