如何使用非常大的训练集来训练dlib形状预测器 [英] How can I train dlib shape predictor using a very large training set
问题描述
我正在尝试使用python dlib.train_shape_predictor
函数来训练大量图像(约50,000个)。
我创建了一个包含必要数据的xml文件,但是看起来train_shape_predictor在开始训练之前将所有参考图像都加载到了RAM中。由于使用了超过100gb的RAM,这导致进程终止。即使精简数据集也使用超过20gb(机器仅具有16gb的物理内存)。
我正在使用python 3.7.2和dlib 19.16.0,通过macOS上的pip安装。
我将此问题发布在dlib github上,并得到了作者的答复:
这样更改代码以在磁盘和内存之间来回循环是不合理的。这会使训练非常缓慢。相反,您应该购买更多的RAM或使用较小的图像。
按照设计,大型训练集需要大量的RAM。
I'm trying to use the python dlib.train_shape_predictor
function to train using a very large set of images (~50,000).
I've created an xml file containing the necessary data, but it seems like train_shape_predictor loads all the referenced images into RAM before it starts training. This leads to the process getting terminated because it uses over 100gb of RAM. Even trimming down the data set uses over 20gb (machine only has 16gb physical memory).
Is there some way to get train_shape_predictor to load images on demand, instead of all at once?
I'm using python 3.7.2 and dlib 19.16.0 installed via pip on macOS.
I posted this as an issue on the dlib github and got this response from the author:
It's not reasonable to change the code to cycle back and forth between disk and ram like that. It will make training very slow. You should instead buy more RAM, or use smaller images.
As designed, large training sets need tons of RAM.
这篇关于如何使用非常大的训练集来训练dlib形状预测器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!