增加 Pytorch 神经网络数据集的 batch_size [英] Increasing batch_size of dataset for Pytorch neural network

查看:61
本文介绍了增加 Pytorch 神经网络数据集的 batch_size的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前使用 batch_size =1 进行神经网络训练,要在多个 gpus 上运行它,我需要将批量大小增加到大于 gpus 的数量,所以我想要 batch_size=16,尽管我的方式是数据设置我不知道如何更改

I currently have my neural network training with a batch_size =1 , To run it across multiple gpus i need to increase the batch size to be larger than the amount of gpus so i want batch_size=16, although the way i have my data set up i am not sure how to change that

从csv文件中读取数据

The data is read from a csv file

raw_data = pd.read_csv("final.csv")
train_data = raw_data[:750]
test_data = raw_data[750:]

然后将数据归一化并转化为张量

Then the data is normalized and turned to Tensors

# normalize features
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_train = scaler.fit_transform(train_data)
scaled_test = scaler.transform(test_data)
# Turn into Tensorflow Tensors
train_data_normalized = torch.FloatTensor(scaled_train).view(-1)
test_data_normalized = torch.FloatTensor(scaled_test).view(-1)

然后将数据变成[输入列表,输出]格式的Tensor Tuple例如 (张量([1,3,56,63,3]),张量([34]))

Then the data is turned into a Tensor Tuple of [input list, output] format e.g (tensor([1,3,56,63,3]),tensor([34]))

# Convert to tensor tuples
def input_series_sequence(input_data, tw):
 inout_seq = []
 L = len(input_data)
 i = 0
 for index in range(L - tw):
    train_seq = input_data[i:i + tw]
    train_label = input_data[i + tw:i + tw + 1]
    inout_seq.append((train_seq, train_label))
    i = i + tw
 return inout_seq


train_inout_seq = input_series_sequence(train_data_normalized, train_window)
test_input_seq = input_series_sequence(test_data_normalized, train_window)

然后像这样训练模型

for i in range(epochs):

for seq, labels in train_inout_seq:
    optimizer.zero_grad()
    model.module.hidden_cell = model.module.init_hidden()
    seq = seq.to(device)
    labels = labels.to(device)
    y_pred = model(seq)

    single_loss = loss_function(y_pred, labels)
    single_loss.backward()
    optimizer.step()

所以我想知道如何将 batch_size 从 1 -> 16 更改为准确,我需要使用 Dataset 和 Dataloader 吗?如果是这样,它究竟如何适合我当前的代码,谢谢!

So i want to know how exactly to change the batch_size from 1 -> 16 , Do i need to use Dataset and Dataloader? and if so how exactly would it fit in with my current code, thanks!

模型是这样定义的,可能需要更改转发功能?

Model is defined like this, might have to change the forward function?

class LSTM(nn.Module):
def __init__(self, input_size=1, hidden_layer_size=100, output_size=1):
    super().__init__()
    self.hidden_layer_size = hidden_layer_size

    self.lstm = nn.LSTM(input_size, hidden_layer_size)

    self.linear = nn.Linear(hidden_layer_size, output_size)

    self.hidden_cell = (torch.zeros(1, 1, self.hidden_layer_size),
                        torch.zeros(1, 1, self.hidden_layer_size))

def init_hidden(self):
    return (torch.zeros(1, 1, self.hidden_layer_size),
            torch.zeros(1, 1, self.hidden_layer_size))

def forward(self, input_seq):
    lstm_out, self.hidden_cell = self.lstm(input_seq.view(len(input_seq), 1, -1), self.hidden_cell)
    predictions = self.linear(lstm_out.view(len(input_seq), -1))
    return predictions[-1]

推荐答案

您可以通过使用 nn.DataParallel 类包装模型来实现此目的.

You can do this by wrapping your model by a nn.DataParallel class.

model = nn.DataParallel(model)

由于我现在无法访问多个 GPU 和您的数据进行测试,我将指导您这里

Since I don't have access to multiple GPUs and your data right now to test, I'll direct you here

这篇关于增加 Pytorch 神经网络数据集的 batch_size的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆