增加 Pytorch 神经网络数据集的 batch_size [英] Increasing batch_size of dataset for Pytorch neural network
问题描述
我目前使用 batch_size =1 进行神经网络训练,要在多个 gpus 上运行它,我需要将批量大小增加到大于 gpus 的数量,所以我想要 batch_size=16,尽管我的方式是数据设置我不知道如何更改
I currently have my neural network training with a batch_size =1 , To run it across multiple gpus i need to increase the batch size to be larger than the amount of gpus so i want batch_size=16, although the way i have my data set up i am not sure how to change that
从csv文件中读取数据
The data is read from a csv file
raw_data = pd.read_csv("final.csv")
train_data = raw_data[:750]
test_data = raw_data[750:]
然后将数据归一化并转化为张量
Then the data is normalized and turned to Tensors
# normalize features
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_train = scaler.fit_transform(train_data)
scaled_test = scaler.transform(test_data)
# Turn into Tensorflow Tensors
train_data_normalized = torch.FloatTensor(scaled_train).view(-1)
test_data_normalized = torch.FloatTensor(scaled_test).view(-1)
然后将数据变成[输入列表,输出]格式的Tensor Tuple例如 (张量([1,3,56,63,3]),张量([34]))
Then the data is turned into a Tensor Tuple of [input list, output] format e.g (tensor([1,3,56,63,3]),tensor([34]))
# Convert to tensor tuples
def input_series_sequence(input_data, tw):
inout_seq = []
L = len(input_data)
i = 0
for index in range(L - tw):
train_seq = input_data[i:i + tw]
train_label = input_data[i + tw:i + tw + 1]
inout_seq.append((train_seq, train_label))
i = i + tw
return inout_seq
train_inout_seq = input_series_sequence(train_data_normalized, train_window)
test_input_seq = input_series_sequence(test_data_normalized, train_window)
然后像这样训练模型
for i in range(epochs):
for seq, labels in train_inout_seq:
optimizer.zero_grad()
model.module.hidden_cell = model.module.init_hidden()
seq = seq.to(device)
labels = labels.to(device)
y_pred = model(seq)
single_loss = loss_function(y_pred, labels)
single_loss.backward()
optimizer.step()
所以我想知道如何将 batch_size 从 1 -> 16 更改为准确,我需要使用 Dataset 和 Dataloader 吗?如果是这样,它究竟如何适合我当前的代码,谢谢!
So i want to know how exactly to change the batch_size from 1 -> 16 , Do i need to use Dataset and Dataloader? and if so how exactly would it fit in with my current code, thanks!
模型是这样定义的,可能需要更改转发功能?
Model is defined like this, might have to change the forward function?
class LSTM(nn.Module):
def __init__(self, input_size=1, hidden_layer_size=100, output_size=1):
super().__init__()
self.hidden_layer_size = hidden_layer_size
self.lstm = nn.LSTM(input_size, hidden_layer_size)
self.linear = nn.Linear(hidden_layer_size, output_size)
self.hidden_cell = (torch.zeros(1, 1, self.hidden_layer_size),
torch.zeros(1, 1, self.hidden_layer_size))
def init_hidden(self):
return (torch.zeros(1, 1, self.hidden_layer_size),
torch.zeros(1, 1, self.hidden_layer_size))
def forward(self, input_seq):
lstm_out, self.hidden_cell = self.lstm(input_seq.view(len(input_seq), 1, -1), self.hidden_cell)
predictions = self.linear(lstm_out.view(len(input_seq), -1))
return predictions[-1]
推荐答案
您可以通过使用 nn.DataParallel
类包装模型来实现此目的.
You can do this by wrapping your model by a nn.DataParallel
class.
model = nn.DataParallel(model)
由于我现在无法访问多个 GPU 和您的数据进行测试,我将指导您这里
Since I don't have access to multiple GPUs and your data right now to test, I'll direct you here
这篇关于增加 Pytorch 神经网络数据集的 batch_size的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!