在多个GPU上运行LSTM会得到“输入张量和隐藏张量不在同一设备上". [英] Running LSTM with multiple GPUs gets "Input and hidden tensors are not at the same device"
问题描述
我正在尝试在pytorch中训练LSTM层.我正在使用4个GPU.初始化时,我添加了.cuda()函数将隐藏层移动到GPU.但是,当我使用多个GPU运行代码时,出现此运行时错误:
I am trying to train a LSTM layer in pytorch. I am using 4 GPUs. When initializing, I added the .cuda() function move the hidden layer to GPU. But when I run the code with multiple GPUs I am getting this runtime error :
RuntimeError: Input and hidden tensors are not at the same device
我试图通过在以下正向函数中使用.cuda()函数来解决此问题:
I have tried to solve the problem by using .cuda() function in the forward function like below :
self.hidden = (self.hidden[0].type(torch.FloatTensor).cuda(), self.hidden[1].type(torch.FloatTensor).cuda())
这行代码似乎可以解决问题,但令我担心的是,是否在不同的GPU中看到了更新的隐藏层.我应该将向量在前进功能的末尾移回cpu进行批处理,还是有其他方法可以解决此问题.
This line seems to solve the problem, but it raises my concern that if the updated hidden layer is seen in different GPUs. Should I move the vector back to cpu at the end of the forward function for a batch or is there any other way to solve the problem.
推荐答案
When you call .cuda()
on the tensor, Pytorch moves it to the current GPU device by default (GPU-0). So, due to data parallelism, your data lives in a different GPU while your model goes to another, this results in the runtime error you are facing.
为递归神经网络实现数据并行性的正确方法如下:
The correct way to implement data parallelism for recurrent neural networks is as follows:
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
class MyModule(nn.Module):
# ... __init__, other methods, etc.
# padded_input is of shape [B x T x *] (batch_first mode) and contains
# the sequences sorted by lengths
# B is the batch size
# T is max sequence length
def forward(self, padded_input, input_lengths):
total_length = padded_input.size(1) # get the max sequence length
packed_input = pack_padded_sequence(padded_input, input_lengths,
batch_first=True)
packed_output, _ = self.my_lstm(packed_input)
output, _ = pad_packed_sequence(packed_output, batch_first=True,
total_length=total_length)
return output
m = MyModule().cuda()
dp_m = nn.DataParallel(m)
对于多GPU设置,您还需要相应地设置CUDA_VISIBLE_DEVICES
环境变量.
You also need to set the CUDA_VISIBLE_DEVICES
environment variable accordingly for a multi GPU setup.
参考:
- Data Parallelism
- Fast.ai Forums
- RNNs and Data Parallelism
这篇关于在多个GPU上运行LSTM会得到“输入张量和隐藏张量不在同一设备上".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!