在多个GPU上运行LSTM会得到“输入张量和隐藏张量不在同一设备上". [英] Running LSTM with multiple GPUs gets "Input and hidden tensors are not at the same device"

查看:347
本文介绍了在多个GPU上运行LSTM会得到“输入张量和隐藏张量不在同一设备上".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在pytorch中训练LSTM层.我正在使用4个GPU.初始化时,我添加了.cuda()函数将隐藏层移动到GPU.但是,当我使用多个GPU运行代码时,出现此运行时错误:

I am trying to train a LSTM layer in pytorch. I am using 4 GPUs. When initializing, I added the .cuda() function move the hidden layer to GPU. But when I run the code with multiple GPUs I am getting this runtime error :

RuntimeError: Input and hidden tensors are not at the same device

我试图通过在以下正向函数中使用.cuda()函数来解决此问题:

I have tried to solve the problem by using .cuda() function in the forward function like below :

self.hidden = (self.hidden[0].type(torch.FloatTensor).cuda(), self.hidden[1].type(torch.FloatTensor).cuda()) 

这行代码似乎可以解决问题,但令我担心的是,是否在不同的GPU中看到了更新的隐藏层.我应该将向量在前进功能的末尾移回cpu进行批处理,还是有其他方法可以解决此问题.

This line seems to solve the problem, but it raises my concern that if the updated hidden layer is seen in different GPUs. Should I move the vector back to cpu at the end of the forward function for a batch or is there any other way to solve the problem.

推荐答案

在张量上调用.cuda()时,Pytorch会将其移动到

When you call .cuda() on the tensor, Pytorch moves it to the current GPU device by default (GPU-0). So, due to data parallelism, your data lives in a different GPU while your model goes to another, this results in the runtime error you are facing.

为递归神经网络实现数据并行性的正确方法如下:

The correct way to implement data parallelism for recurrent neural networks is as follows:

from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

class MyModule(nn.Module):
    # ... __init__, other methods, etc.

    # padded_input is of shape [B x T x *] (batch_first mode) and contains
    # the sequences sorted by lengths
    #   B is the batch size
    #   T is max sequence length
    def forward(self, padded_input, input_lengths):
        total_length = padded_input.size(1)  # get the max sequence length
        packed_input = pack_padded_sequence(padded_input, input_lengths,
                                            batch_first=True)
        packed_output, _ = self.my_lstm(packed_input)
        output, _ = pad_packed_sequence(packed_output, batch_first=True,
                                        total_length=total_length)
        return output

m = MyModule().cuda()
dp_m = nn.DataParallel(m)

对于多GPU设置,您还需要相应地设置CUDA_VISIBLE_DEVICES环境变量.

You also need to set the CUDA_VISIBLE_DEVICES environment variable accordingly for a multi GPU setup.

参考:

  • Data Parallelism
  • Fast.ai Forums
  • RNNs and Data Parallelism

这篇关于在多个GPU上运行LSTM会得到“输入张量和隐藏张量不在同一设备上".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆