Pytorch-RuntimeError:尝试第二次向后浏览图形,但缓冲区已被释放 [英] Pytorch - RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed

查看:5672
本文介绍了Pytorch-RuntimeError:尝试第二次向后浏览图形,但缓冲区已被释放的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直遇到这个错误:

RuntimeError:尝试第二次向后浏览图形,但缓冲区已被释放.第一次回叫时,请指定keep_graph = True.

我在Pytorch论坛中进行了搜索,但仍然无法确定自定义损失函数中做错了什么.我的模型是nn.GRU,这是我的自定义损失函数:

def _loss(outputs, session, items):  # `items` is a dict() contains embedding of all items
    def f(output, target):
        pos = torch.from_numpy(np.array([items[target["click"]]])).float()
        neg = torch.from_numpy(np.array([items[idx] for idx in target["suggest_list"] if idx != target["click"]])).float()
        if USE_CUDA:
            pos, neg = pos.cuda(), neg.cuda()
        pos, neg = Variable(pos), Variable(neg)

        pos = F.cosine_similarity(output, pos)
        if neg.size()[0] == 0:
            return torch.mean(F.logsigmoid(pos))
        neg = F.cosine_similarity(output.expand_as(neg), neg)

        return torch.mean(F.logsigmoid(pos - neg))

    loss = map(f, outputs, session)
return -torch.mean(torch.cat(loss))

培训代码:

    # zero the parameter gradients
    model.zero_grad()

    # forward + backward + optimize
    outputs, hidden = model(inputs, hidden)
    loss = _loss(outputs, session, items)
    acc_loss += loss.data[0]

    loss.backward()
    # Add parameters' gradients to their values, multiplied by learning rate
    for p in model.parameters():
        p.data.add_(-learning_rate, p.grad.data)

解决方案

问题出在我的训练循环中:它不会在批次之间分离或重新包装隐藏状态?如果是这样,则loss.backward()一直尝试向后传播直到时间开始,这对第一批有效,但对第二批无效,因为第一批的图形已被丢弃.

有两种可能的解决方案.

1)在批次之间分离/重新包装隐藏状态.有(在 最少)执行此操作的三种方法(我选择了此解决方案):

 hidden.detach_()
 hidden = hidden.detach()

2)用loss.backward(retain_graph=True)替换loss.backward(),但是知道每个连续的批处理将比前一个批处理花费更多的时间,因为它将必须一直反向传播直到第一个批处理的开始. /p>

示例

I keep running into this error:

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

I had searched in Pytorch forum, but still can’t find out what I have done wrong in my custom loss function. My model is nn.GRU, and here is my custom loss function:

def _loss(outputs, session, items):  # `items` is a dict() contains embedding of all items
    def f(output, target):
        pos = torch.from_numpy(np.array([items[target["click"]]])).float()
        neg = torch.from_numpy(np.array([items[idx] for idx in target["suggest_list"] if idx != target["click"]])).float()
        if USE_CUDA:
            pos, neg = pos.cuda(), neg.cuda()
        pos, neg = Variable(pos), Variable(neg)

        pos = F.cosine_similarity(output, pos)
        if neg.size()[0] == 0:
            return torch.mean(F.logsigmoid(pos))
        neg = F.cosine_similarity(output.expand_as(neg), neg)

        return torch.mean(F.logsigmoid(pos - neg))

    loss = map(f, outputs, session)
return -torch.mean(torch.cat(loss))

Training code:

    # zero the parameter gradients
    model.zero_grad()

    # forward + backward + optimize
    outputs, hidden = model(inputs, hidden)
    loss = _loss(outputs, session, items)
    acc_loss += loss.data[0]

    loss.backward()
    # Add parameters' gradients to their values, multiplied by learning rate
    for p in model.parameters():
        p.data.add_(-learning_rate, p.grad.data)

解决方案

The problem is from my training loop: it doesn’t detach or repackage the hidden state in between batches? If so, then loss.backward() is trying to back-propagate all the way through to the start of time, which works for the first batch but not for the second because the graph for the first batch has been discarded.

there are two possible solutions.

1) detach/repackage the hidden state in between batches. There are (at least) three ways to do this (and I chose this solution):

 hidden.detach_()
 hidden = hidden.detach()

2) replace loss.backward() with loss.backward(retain_graph=True) but know that each successive batch will take more time than the previous one because it will have to back-propagate all the way through to the start of the first batch.

Example

这篇关于Pytorch-RuntimeError:尝试第二次向后浏览图形,但缓冲区已被释放的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆