使用多个 GPU 运行 LSTM 会得到“输入和隐藏张量不在同一设备上"; [英] Running LSTM with multiple GPUs gets "Input and hidden tensors are not at the same device"

查看:25
本文介绍了使用多个 GPU 运行 LSTM 会得到“输入和隐藏张量不在同一设备上";的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 pytorch 中训练 LSTM 层.我正在使用 4 个 GPU.初始化时,我添加了 .cuda() 函数将隐藏层移动到 GPU.但是当我使用多个 GPU 运行代码时,我收到此运行时错误:

I am trying to train a LSTM layer in pytorch. I am using 4 GPUs. When initializing, I added the .cuda() function move the hidden layer to GPU. But when I run the code with multiple GPUs I am getting this runtime error :

RuntimeError: Input and hidden tensors are not at the same device

我试图通过在前向函数中使用 .cuda() 函数来解决这个问题,如下所示:

I have tried to solve the problem by using .cuda() function in the forward function like below :

self.hidden = (self.hidden[0].type(torch.FloatTensor).cuda(), self.hidden[1].type(torch.FloatTensor).cuda()) 

这条线似乎解决了问题,但它引起了我的担忧,如果在不同的 GPU 中看到更新的隐藏层.我应该在批处理的前向函数结束时将向量移回 cpu 还是有其他方法可以解决问题.

This line seems to solve the problem, but it raises my concern that if the updated hidden layer is seen in different GPUs. Should I move the vector back to cpu at the end of the forward function for a batch or is there any other way to solve the problem.

推荐答案

当你在张量上调用 .cuda() 时,Pytorch 将它移动到 当前 GPU 设备(GPU-0).因此,由于数据并行性,您的数据位于不同的 GPU 中,而您的模型则位于另一个 GPU 中,这会导致您面临运行时错误.

When you call .cuda() on the tensor, Pytorch moves it to the current GPU device by default (GPU-0). So, due to data parallelism, your data lives in a different GPU while your model goes to another, this results in the runtime error you are facing.

为循环神经网络实现数据并行的正确方法如下:

The correct way to implement data parallelism for recurrent neural networks is as follows:

from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

class MyModule(nn.Module):
    # ... __init__, other methods, etc.

    # padded_input is of shape [B x T x *] (batch_first mode) and contains
    # the sequences sorted by lengths
    #   B is the batch size
    #   T is max sequence length
    def forward(self, padded_input, input_lengths):
        total_length = padded_input.size(1)  # get the max sequence length
        packed_input = pack_padded_sequence(padded_input, input_lengths,
                                            batch_first=True)
        packed_output, _ = self.my_lstm(packed_input)
        output, _ = pad_packed_sequence(packed_output, batch_first=True,
                                        total_length=total_length)
        return output

m = MyModule().cuda()
dp_m = nn.DataParallel(m)

您还需要为多 GPU 设置相应地设置 CUDA_VISIBLE_DEVICES 环境变量.

You also need to set the CUDA_VISIBLE_DEVICES environment variable accordingly for a multi GPU setup.

参考文献:

这篇关于使用多个 GPU 运行 LSTM 会得到“输入和隐藏张量不在同一设备上";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆