加载图像的最快方法是什么? [英] what is the fastest way of loading images?

查看:29
本文介绍了加载图像的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大约 200,000 张高分辨率图像,每次加载如此高质量的图像都很耗时.预加载所有图像可能会占用太多内存.如何将每个图像保存为 .npz 文件格式并加载 .npz 而不是 .jpg?会提高速度吗?

I have about 200,000 high resolution images, and loading such high quality images every time is time consuming. Preloading all images might occupy too much memory. How about saving each images into .npz file format and loading .npz instead of .jpg? Would it be boosting speed?

推荐答案

您不需要一次将所有图像加载到内存中.还考虑到我们在模型训练时需要对数据集进行数据增强,所以无法加载所有图像.

You do not need to load all the image to memory at once. Considering also that we need to do data augmentation on the dataset during model training, it is impossible to load all images.

在 PyTorch 中,您可以使用 Dataset 用于存储您的训练和验证集.Dataset 类有一个参数 transforms(例如,Scale、RandomCrop 等),用于在训练期间动态转换训练图像.torchvision 包还提供了几个现成的数据集,参见这里.

In PyTorch, you can use Dataset to store your training and validation set. The Dataset class has a parameter transforms(e.g., Scale, RandomCrop, etc.), which is used to transform the training image on the fly during training. Several ready-made dataset are also provided by torchvision package, see here.

PyTorch 的内置 Dataloader 有一个 num_worker,用于控制加载数据时使用的子进程数量.由于您的数据集不是很大,这足以满足您的需要.关于如何设置合适的worker数量,参见这里.

PyTorch's builtin Dataloader has a num_worker, which is used to control how many subprocess you use for loading the data. Since your dataset is not so huge, that would be enough for your need. About how to set the appropriate number of worker, see here.

PyTorch 论坛上有关于快速图片加载的讨论,使用 post1post2 作为开始.

There are discussion on PyTorch forum on fast image loading, use post1 and post2 as a start.

这篇关于加载图像的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆