如何使用非常大的训练集来训练dlib形状预测器 [英] How can I train dlib shape predictor using a very large training set

查看:123
本文介绍了如何使用非常大的训练集来训练dlib形状预测器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python dlib.train_shape_predictor 函数来训练大量图像(约50,000个)。



我创建了一个包含必要数据的xml文件,但是看起来train_shape_predictor在开始训练之前将所有参考图像都加载到了RAM中。由于使用了超过100gb的RAM,这导致进程终止。即使精简数据集也使用超过20gb(机器仅具有16gb的物理内存)。



我正在使用python 3.7.2和dlib 19.16.0,通过macOS上的pip安装。

解决方案

我将此问题发布在dlib github上,并得到了作者的答复:


这样更改代码以在磁盘和内存之间来回循环是不合理的。这会使训练非常缓慢。相反,您应该购买更多的RAM或使用较小的图像。


按照设计,大型训练集需要大量的RAM。


I'm trying to use the python dlib.train_shape_predictor function to train using a very large set of images (~50,000).

I've created an xml file containing the necessary data, but it seems like train_shape_predictor loads all the referenced images into RAM before it starts training. This leads to the process getting terminated because it uses over 100gb of RAM. Even trimming down the data set uses over 20gb (machine only has 16gb physical memory).

Is there some way to get train_shape_predictor to load images on demand, instead of all at once?

I'm using python 3.7.2 and dlib 19.16.0 installed via pip on macOS.

解决方案

I posted this as an issue on the dlib github and got this response from the author:

It's not reasonable to change the code to cycle back and forth between disk and ram like that. It will make training very slow. You should instead buy more RAM, or use smaller images.

As designed, large training sets need tons of RAM.

这篇关于如何使用非常大的训练集来训练dlib形状预测器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆