如何更改Pytorch数据集的大小? [英] How do you alter the size of a Pytorch Dataset?

查看:1343
本文介绍了如何更改Pytorch数据集的大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我要从torchvision.datasets.MNIST加载MNIST,但是我只想总共加载10000张图像,我将如何对数据进行切片以将其限制为仅一些数据点?我知道DataLoader是生成器,可以生成指定批处理大小的数据,但是如何分割数据集?

Say I am loading MNIST from torchvision.datasets.MNIST, but I only want to load in 10000 images total, how would I slice the data to limit it to only some number of data points? I understand that the DataLoader is a generator yielding data in the size of the specified batch size, but how do you slice datasets?

tr = datasets.MNIST('../data', train=True, download=True, transform=transform)
te = datasets.MNIST('../data', train=False, transform=transform)
train_loader = DataLoader(tr, batch_size=args.batch_size, shuffle=True, num_workers=4, **kwargs)
test_loader = DataLoader(te, batch_size=args.batch_size, shuffle=True, num_workers=4, **kwargs)

推荐答案

重要的是要注意,当您创建DataLoader对象时,它不会立即加载所有数据(这对于大型数据集而言是不切实际的).它为您提供了一个迭代器,可用于访问每个样本.

It is important to note that when you create the DataLoader object, it doesnt immediately load all of your data (its impractical for large datasets). It provides you an iterator that you can use to access each sample.

不幸的是,DataLoader没有为您提供任何方法来控制您希望提取的样本数量.您将必须使用对迭代器进行切片的典型方法.

Unfortunately, DataLoader doesnt provide you with any way to control the number of samples you wish to extract. You will have to use the typical ways of slicing iterators.

最简单的方法(没有任何库)是在达到所需的样本数量后停止.

Simplest thing to do (without any libraries) would be to stop after the required number of samples is reached.

nsamples = 10000
for i, image, label in enumerate(train_loader):
    if i > nsamples:
        break

    # Your training code here.

或者,您可以使用 itertools.islice 来获取前1万个样本.像这样.

Or, you could use itertools.islice to get the first 10k samples. Like so.

for image, label in itertools.islice(train_loader, stop=10000):

    # your training code here.

这篇关于如何更改Pytorch数据集的大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆