PyTorch:如何将 DataLoaders 用于自定义数据集 [英] PyTorch: How to use DataLoaders for custom Datasets

查看:62
本文介绍了PyTorch:如何将 DataLoaders 用于自定义数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在您自己的数据上使用 torch.utils.data.Datasettorch.utils.data.DataLoader(不仅仅是 torchvision.datasets)?

How to make use of the torch.utils.data.Dataset and torch.utils.data.DataLoader on your own data (not just the torchvision.datasets)?

有没有办法使用他们在 TorchVisionDatasets 上使用的内置 DataLoaders 以用于任何数据集?

Is there a way to use the inbuilt DataLoaders which they use on TorchVisionDatasets to be used on any dataset?

推荐答案

是的,这是可能的.只需自己创建对象,例如

Yes, that is possible. Just create the objects by yourself, e.g.

import torch.utils.data as data_utils

train = data_utils.TensorDataset(features, targets)
train_loader = data_utils.DataLoader(train, batch_size=50, shuffle=True)

其中 featurestargets 是张量.features 必须是二维的,即一个矩阵,其中每一行代表一个训练样本,而 targets 可能是一维或二维的,这取决于你是尝试预测标量或向量.

where features and targets are tensors. features has to be 2-D, i.e. a matrix where each line represents one training sample, and targets may be 1-D or 2-D, depending on whether you are trying to predict a scalar or a vector.

希望有帮助!

编辑:回复@sarthak 的问题

EDIT: response to @sarthak's question

基本上是的.如果你创建一个 TensorData 类型的对象,那么构造函数会调查特征张量的第一个维度(实际上称为 data_tensor)和目标张量(称为 >target_tensor) 具有相同的长度:

Basically yes. If you create an object of type TensorData, then the constructor investigates whether the first dimensions of the feature tensor (which is actually called data_tensor) and the target tensor (called target_tensor) have the same length:

assert data_tensor.size(0) == target_tensor.size(0)

但是,如果您想随后将这些数据输入到神经网络中,则需要小心.虽然卷积层处理像你这样的数据,(我认为)所有其他类型的层都希望数据以矩阵形式给出.因此,如果您遇到这样的问题,那么一个简单的解决方案是使用 方法将您的 4D 数据集(以某种张量形式给出,例如 FloatTensor)转换为矩阵>查看.对于您的 5000xnxnx3 数据集,这将如下所示:

However, if you want to feed these data into a neural network subsequently, then you need to be careful. While convolution layers work on data like yours, (I think) all of the other types of layers expect the data to be given in matrix form. So, if you run into an issue like this, then an easy solution would be to convert your 4D-dataset (given as some kind of tensor, e.g. FloatTensor) into a matrix by using the method view. For your 5000xnxnx3 dataset, this would look like this:

2d_dataset = 4d_dataset.view(5000, -1)

(值 -1 告诉 PyTorch 自动计算第二维的长度.)

(The value -1 tells PyTorch to figure out the length of the second dimension automatically.)

这篇关于PyTorch:如何将 DataLoaders 用于自定义数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆