如何在 Pytorch 中测试自定义数据集? [英] How do you test a custom dataset in Pytorch?

查看:45
本文介绍了如何在 Pytorch 中测试自定义数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在关注 Pytorch 中使用来自 Pytorch 的数据集的教程,这些教程允许您启用是否要使用数据进行训练……但现在我使用的是 .csv 和自定义数据集.

class MyDataset(Dataset):def __init__(self, root, n_inp):self.df = pd.read_csv(root)self.data = self.df.to_numpy()self.x , self.y = (torch.from_numpy(self.data[:,:n_inp]),torch.from_numpy(self.data[:,n_inp:]))def __getitem__(self, idx):返回 self.x[idx, :], self.y[idx,:]def __len__(self):返回 len(self.data)

如何告诉 Pytorch 不要训练我的 test_dataset,以便我可以将其用作模型准确度的参考?

train_dataset = MyDataset("heart.csv", input_size)train_loader = DataLoader(train_dataset,batch_size=batch_size,shuffle =True)test_dataset = MyDataset(heart.csv", input_size)test_loader = DataLoader(test_dataset,batch_size=batch_size,shuffle =True)

解决方案

在 pytorch 中,自定义数据集继承了 Dataset 类.它主要包含两种方法 __len__() 是指定要迭代的数据集对象的长度和 __getitem__() 一次返回一批数据.>

一旦数据加载器对象被初始化(train_loadertest_loader 在你的代码中指定),你需要编写一个训练循环和一个测试循环.

def train(model, optimizer, loss_fn, dataloader):模型.train()对于 enumerate(dataloader) 中的 i, (input, gt):if params.use_gpu: #(如果使用GPU训练)输入,gt = input.cuda(non_blocking = True),gt.cuda(non_blocking = True)预测 = 模型(输入)损失= loss_fn(预测,GT)optimizer.zero_grad()损失.向后()优化器.step()

并且您的测试循环应该是:

def test(model,loss_fn, dataloader):模型.评估()对于 enumerate(dataloader) 中的 i, (input, gt):if params.use_gpu: #(如果使用GPU训练)输入,gt = input.cuda(non_blocking = True),gt.cuda(non_blocking = True)预测 = 模型(输入)损失 = loss_fn(预测,GT)

此外,您可以使用指标字典来记录您的预测、损失、时期等.训练和测试循环之间的主要区别在于,我们在推理阶段排除了反向传播(zero_grad()、backward()、step()).

最后,

 for epoch in range(1, epochs + 1):火车(模型,优化器,loss_fn,train_loader)测试(模型,loss_fn,test_loader)

I've been following tutorials in Pytorch that use datasets from Pytorch that allow you to enable whether you'd like to train using the data or not... But now I'm using a .csv and a custom dataset.

class MyDataset(Dataset):
    def __init__(self, root, n_inp):
        self.df = pd.read_csv(root)
        self.data = self.df.to_numpy()
        self.x , self.y = (torch.from_numpy(self.data[:,:n_inp]),
                           torch.from_numpy(self.data[:,n_inp:]))
    def __getitem__(self, idx):
        return self.x[idx, :], self.y[idx,:]
    def __len__(self):
        return len(self.data)

How can I tell Pytorch not to train my test_dataset so I can use it as a reference of how accurate my model is?

train_dataset = MyDataset("heart.csv", input_size)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle =True)
test_dataset = MyDataset("heart.csv", input_size)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle =True)

解决方案

In pytorch, a custom dataset inherits the class Dataset. Mainly it contains two methods __len__() is to specify the length of your dataset object to iterate over and __getitem__() to return a batch of data at a time.

Once the dataloader objects are initialized (train_loader and test_loader as specified in your code), you need to write a train loop and a test loop.

def train(model, optimizer, loss_fn, dataloader):
    model.train()
    for i, (input, gt) in enumerate(dataloader):
        if params.use_gpu: #(If training using GPU)
            input, gt = input.cuda(non_blocking = True), gt.cuda(non_blocking = True)
        predicted = model(input)
        loss = loss_fn(predicted, gt)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

and your test loop should be:

def test(model,loss_fn, dataloader):
    model.eval()
    for i, (input, gt) in enumerate(dataloader):
        if params.use_gpu: #(If training using GPU)
            input, gt = input.cuda(non_blocking = True), gt.cuda(non_blocking = True)
        predicted = model(input)
        loss     = loss_fn(predicted, gt)

In additional you can use metrics dictionary to log your predicted, loss, epochs etc,. The main difference between training and test loop is that we exclude back propagation (zero_grad(), backward(), step()) in inference stage.

Finally,

for epoch in range(1, epochs + 1):
    train(model, optimizer, loss_fn, train_loader)
    test(model, loss_fn, test_loader)

这篇关于如何在 Pytorch 中测试自定义数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆