MNIST Pytorch中的验证错误意外增加 [英] Unexpected increase in validation error in MNIST Pytorch

查看:119
本文介绍了MNIST Pytorch中的验证错误意外增加的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对整个领域有点陌生,因此决定从事MNIST数据集的工作.我几乎从 https://github.com/修改了整个代码pytorch/examples/blob/master/mnist/main.py ,只有一项重大更改:数据加载".我不想在Torchvision中使用预加载的数据集.因此,我在CSV中使用了 MNIST .

I'm a bit new to the whole field and thus decided to work on the MNIST dataset. I pretty much adapted the whole code from https://github.com/pytorch/examples/blob/master/mnist/main.py, with only one significant change: Data Loading. I didn't want to use the pre-loaded dataset within Torchvision. So I used MNIST in CSV.

我通过从Dataset继承并制作一个新的dataloader从CSV文件中加载了数据. 以下是相关代码:

I loaded the data from CSV file by inheriting from Dataset and making a new dataloader. Here's the relevant code:

mean = 33.318421449829934
sd = 78.56749081851163
# mean = 0.1307
# sd = 0.3081
import numpy as np
from torch.utils.data import Dataset, DataLoader

class dataset(Dataset):
    def __init__(self, csv, transform=None):
        data = pd.read_csv(csv, header=None)
        self.X = np.array(data.iloc[:, 1:]).reshape(-1, 28, 28, 1).astype('float32')
        self.Y = np.array(data.iloc[:, 0])

        del data
        self.transform = transform

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        item = self.X[idx]
        label = self.Y[idx]

        if self.transform:
            item = self.transform(item)

        return (item, label)

import torchvision.transforms as transforms
trainData = dataset('mnist_train.csv', transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((mean,), (sd,))
]))
testData = dataset('mnist_test.csv', transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((mean,), (sd,))
]))

train_loader = DataLoader(dataset=trainData,
                         batch_size=10, 
                         shuffle=True,
                         )
test_loader = DataLoader(dataset=testData, 
                        batch_size=10, 
                        shuffle=True,
                        )

但是,此代码为我提供了您在图片中看到的绝对奇怪的训练错误图,以及11%的最终验证错误,因为它将所有内容归类为"7".

However this code gives me the absolutely weird training error graph that you see in the picture, and a final validation error of 11% because it classifies everything as a '7'.

我设法将问题归结为如何规范化数据,如果我使用示例代码中给出的值(0.1307和0.3081)进行转换,则规范化以及将数据读取为'uint8'类型都可以完美. 请注意,这两种情况提供的数据差异非常小.对0到1的值进行0.1307和0.3081归一化,与对0到255的值进行33.31和78.56归一化具有相同的效果.该值甚至大体相同(黑色像素对应于-0.4241,而-0.4242在第二个).

I managed to track the problem down to how I normalize the data and if I use the values given in the example code (0.1307, and 0.3081) for transforms.Normalize, along with reading the data as type 'uint8' it works perfectly. Note that there is very minimal difference in the data which is provided in these two cases. Normalizing by 0.1307 and 0.3081 on values from 0 to 1 has the same effect as normalizing by 33.31 and 78.56 on values from 0 to 255. The values are even mostly the same (A black pixel corresponds to -0.4241 in the first case and -0.4242 in the second).

如果您希望在IPython Notebook中清楚地看到此问题,请查看 https://colab.research.google.com/drive/1W1qx7IADpnn5e5w97IcxVvmZAaMK9vL3

If you would like to see a IPython Notebook where this problem is seen clearly, please check out https://colab.research.google.com/drive/1W1qx7IADpnn5e5w97IcxVvmZAaMK9vL3

我无法理解是什么原因导致这两种加载数据的方式略有不同.任何帮助将不胜感激.

I am unable to understand what has caused such a huge difference in behaviour within these two slightly different ways of loading data. Any help would be massively appreciated.

推荐答案

长话短说:您需要将item = self.X[idx]更改为item = self.X[idx].copy().

Long story short: you need to change item = self.X[idx] to item = self.X[idx].copy().

长话多说:T.ToTensor()运行 torch.from_numpy ,它返回一个张量,该张量是numpy数组dataset.X的内存的别名.并且T.Normalize() 就地工作,每次抽取样本时,都会减去mean并除以std,从而导致数据集退化.

Long story long: T.ToTensor() runs torch.from_numpy, which returns a tensor which aliases the memory of your numpy array dataset.X. And T.Normalize() works inplace, so each time the sample is drawn it has mean subtracted and is divided by std, leading to degradation of your dataset.

关于为什么它可以在原始MNIST加载程序中运行,因此兔子洞甚至更深. MNIST中的关键行是图像为 转换为 PIL.Image 实例.该操作声称仅在缓冲区不连续的情况下才复制(在我们的情况下),但是在

regarding why it works in the original MNIST loader, the rabbit hole is even deeper. The key line in MNIST is that the image is transformed into a PIL.Image instance. The operation claims to only copy in case the buffer is not contiguous (it is in our case), but under the hood it checks whether it's strided instead (which it is), and thus copies it. So by luck, the default torchvision pipeline involves a copy and thus in-place operation of T.Normalize() does not corrupt the in-memory self.data of our MNIST instance.

这篇关于MNIST Pytorch中的验证错误意外增加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆