Pytorch数据加载器,线程过多,CPU内存分配过多 [英] Pytorch dataloader, too many threads, too much cpu memory allocation
问题描述
我正在使用PyTorch训练模型。要加载数据,我使用的是 torch.utils.data.DataLoader
。数据加载器正在使用我已实现的自定义数据库。发生一个奇怪的问题,每次执行以下代码中的第二个 for
时,线程/进程数增加,并且分配了大量的内存
I'm training a model using PyTorch. To load the data, I'm using torch.utils.data.DataLoader
. The data loader is using a custom database I've implemented. A strange problem has occurred, every time the second for
in the following code executes, the number of threads/processes increases and a huge amount of memory is allocated
for epoch in range(start_epoch, opt.niter + opt.niter_decay + 1):
epoch_start_time = time.time()
if epoch != start_epoch:
epoch_iter = epoch_iter % dataset_size
for i, item in tqdm(enumerate(dataset, start=epoch_iter)):
我怀疑在每次<code> __ iter __()之后没有释放以前迭代器的线程和内存调用数据加载器。
创建线程时,分配的内存接近于主线程/进程分配的内存量。也就是说,在初始时期,主线程正在使用2GB的内存,因此创建了2个大小为2GB的线程。在下一个时期,主线程分配5GB内存,并构造两个5GB线程( num_workers
为2)。
我怀疑 fork()
函数会将大部分上下文复制到新线程中。
I suspect the threads and memories of the previous iterators are not released after each __iter__()
call to the data loader.
The allocated memory is close to the amount of memory allocated by the main thread/process when the threads are created. That is in the initial epoch the main thread is using 2GB of memory and so 2 threads of size 2GB are created. In the next epochs, 5GB of memory is allocated by the main thread and two 5GB threads are constructed (num_workers
is 2).
I suspect that fork()
function copies most of the context to the new threads.
下面是活动监视器,显示了由python创建的进程, ZMQbg / 1
是与python相关的进程。
The following is the Activity monitor showing the processes created by python, ZMQbg/1
are processes related to python.
数据加载程序使用的我的数据集有100个子数据集,即 __ getitem __
调用随机选择一个(忽略索引
)。 (子数据集是pix2pixHD GitHub存储库中的 AlignedDataset
):
My dataset used by the data loader has 100 sub-datasets, the __getitem__
call randomly selects one (ignoring the index
). (the sub-datasets are AlignedDataset
from pix2pixHD GitHub repository):
推荐答案
torch.utils.data.DataLoader预取2 * num_workers,这样您就可以随时将数据发送到GPU / CPU,这可能就是您看到内存增加的原因
torch.utils.data.DataLoader prefetch 2*num_workers, so that you will always have data ready to send to the GPU/CPU, this could be the reason you see the memory increase
https://pytorch.org/docs/ stable / _modules / torch / utils / data / dataloader.html
这篇关于Pytorch数据加载器,线程过多,CPU内存分配过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!