如何加快"ImageFolder"的访问速度用于ImageNet [英] How to speed up the "ImageFolder" for ImageNet

查看:53
本文介绍了如何加快"ImageFolder"的访问速度用于ImageNet的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在大学里,所有文件系统都在远程系统中,无论我用我的帐户登录到哪里,都可以访问我的主目录.即使我通过SSH命令登录到GPU服务器.这就是我使用GPU服务器读取数据的条件.

I am in an university, and all the file system are in a remote system, wherever I log in with my account, I could aways access my home directory. even though I log into the GPU servers through SSH command. This is the condition where I employ the GPU servers to read data.

当前,我使用PyTorch在ImageNet上从头开始训练ResNet,我的代码仅使用同一台计算机上的所有GPU,我发现"torchvision.datasets.ImageFolder"将花费近两个小时.

Currently, I use the PyTorch to train ResNet from scratch on ImageNet, my codes only use all the GPUs in the same computer, I found that the "torchvision.datasets.ImageFolder" will take almost two hours.

请提供一些有关如何加速"torchvision.datasets.ImageFolder"的经验吗?非常感谢.

Would you please provide some experiences in how to speed up "torchvision.datasets.ImageFolder"? Thanks very much.

推荐答案

为什么要花这么长时间?
设置 ImageFolder 可以时间长,尤其是当图像存储在慢速的远程磁盘上时.此延迟的原因是数据集的 __ init __ 函数遍历了图像文件夹中的所有文件,并检查该文件是否为图像文件.对于ImageNet而言,可能需要花费相当长的时间,因为要检查的文件超过100万个.

Why it takes so long?
Setting up an ImageFolder can take a long time, especially when the images are stored on a slow remote disk. The reason for this latency is that the __init__ function for the dataset goes over all files in the image folders and check whether this file is an image file. For ImageNet that can take quite a while as there are over 1 million files to check.

您能做什么?
-正如 Kevin Sun 所指出的那样,将数据集存储到本地(可能更快)可以大大加快处理速度.
-另外,您可以创建一个修改后的数据集类,该数据集类不读取所有文件,而是依赖文件的缓存列表-仅在其中缓存一次 的缓存列表前进并用于所有运行.

What can you do?
- As Kevin Sun already pointed out, copying the dataset to a local (and possibly much faster) storage can significantly speed up things.
- Alternatively, you can create a modified dataset class that does not read all the files, but relies on a cached list of files - a cached list that you prepare only once in advance and to be used for all runs.

这篇关于如何加快"ImageFolder"的访问速度用于ImageNet的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆