PyTorch:使用 torchvision.datasets.ImageFolder 和 DataLoader 进行测试 [英] PyTorch: Testing with torchvision.datasets.ImageFolder and DataLoader

查看:39
本文介绍了PyTorch:使用 torchvision.datasets.ImageFolder 和 DataLoader 进行测试的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一个新手,试图让这个 PyTorch CNN 与 来自kaggle 的Cats&Dogs 数据集.由于没有测试图像的目标,我手动对一些测试图像进​​行分类并将类放在文件名中,以便能够测试(也许应该只使用一些训练图像).

I'm a newbie trying to make this PyTorch CNN work with the Cats&Dogs dataset from kaggle. As there are no targets for the test images, I manually classified some of the test images and put the class in the filename, to be able to test (maybe should have just used some of the train images).

我使用了 torchvision.datasets.ImageFolder 类来加载训练和测试图像.培训似乎奏效了.

I used the torchvision.datasets.ImageFolder class to load the train and test images. The training seems to work.

但是我需要做什么才能使测试程序正常工作?我不知道,如何通过 test_x 和 test_y 将我的 test_data_loader 与底部的测试循环连接起来.

But what do I need to do to make the test-routine work? I don't know, how to connect my test_data_loader with the test loop at the bottom, via test_x and test_y.

代码基于 这个 MNIST 示例 CNN. 在那里,在加载器创建后立即使用类似的东西.但是我没有为我的数据集重写它:

The Code is based on this MNIST example CNN. There, something like this is used right after the loaders are created. But I failed to rewrite it for my dataset:

test_x = Variable(torch.unsqueeze(test_data.test_data, dim=1), volatile=True).type(torch.FloatTensor)[:2000]/255.   # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
test_y = test_data.test_labels[:2000]

代码:

import os
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torch.utils.data as data
import torchvision
from torchvision import transforms

EPOCHS = 2
BATCH_SIZE = 10
LEARNING_RATE = 0.003
TRAIN_DATA_PATH = "./train_cl/"
TEST_DATA_PATH = "./test_named_cl/"
TRANSFORM_IMG = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(256),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225] )
    ])

train_data = torchvision.datasets.ImageFolder(root=TRAIN_DATA_PATH, transform=TRANSFORM_IMG)
train_data_loader = data.DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True,  num_workers=4)
test_data = torchvision.datasets.ImageFolder(root=TEST_DATA_PATH, transform=TRANSFORM_IMG)
test_data_loader  = data.DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=4) 

class CNN(nn.Module):
    # omitted...

if __name__ == '__main__':

    print("Number of train samples: ", len(train_data))
    print("Number of test samples: ", len(test_data))
    print("Detected Classes are: ", train_data.class_to_idx) # classes are detected by folder structure

    model = CNN()    
    optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)
    loss_func = nn.CrossEntropyLoss()    

    # Training and Testing
    for epoch in range(EPOCHS):        
        for step, (x, y) in enumerate(train_data_loader):
            b_x = Variable(x)   # batch x (image)
            b_y = Variable(y)   # batch y (target)
            output = model(b_x)[0]          
            loss = loss_func(output, b_y)   
            optimizer.zero_grad()           
            loss.backward()                 
            optimizer.step()

            # Test -> this is where I have no clue
            if step % 50 == 0:
                test_x = Variable(test_data_loader)
                test_output, last_layer = model(test_x)
                pred_y = torch.max(test_output, 1)[1].data.squeeze()
                accuracy = sum(pred_y == test_y) / float(test_y.size(0))
                print('Epoch: ', epoch, '| train loss: %.4f' % loss.data[0], '| test accuracy: %.2f' % accuracy)

推荐答案

查看 Kaggle 的数据和你的代码,看来你的数据加载有问题,训练集和测试集都有.首先,对于默认的 PyTorch ImageFolder,数据应该在每个标签的不同文件夹中才能正确加载.在您的情况下,由于所有训练数据都在同一个文件夹中,PyTorch 将其作为一个类加载,因此学习似乎很有效.您可以使用以下文件夹结构来纠正此问题 - train/dog, - train/cat, - test/dog, - test/cat 然后将训练和测试文件夹分别传递给训练和测试 ImageFolder.训练代码看起来不错,只需更改文件夹结构就可以了.看一看 ImageFolder 的官方文档,它有一个类似的例子.

Looking at the data from Kaggle and your code, it seems that there are problems in your data loading, both train and test set. First of all, the data should be in a different folder per label for the default PyTorch ImageFolder to load it correctly. In your case, since all the training data is in the same folder, PyTorch is loading it as one class and hence learning seems to be working. You can correct this by using a folder structure like - train/dog, - train/cat, - test/dog, - test/cat and then passing the train and the test folder to the train and test ImageFolder respectively. The training code seems fine, just change the folder structure and you should be good. Take a look at the official documentation of ImageFolder which has a similar example.

这篇关于PyTorch:使用 torchvision.datasets.ImageFolder 和 DataLoader 进行测试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆