TensorFlow对象检测API-内存不足 [英] TensorFlow Object Detection API - Out of Memory

查看:139
本文介绍了TensorFlow对象检测API-内存不足的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Tensorflow对象检测API来训练自己的对象检测器.我从模型动物园(此处下载了faster_rcnn_inception_v2_coco_2018_01_28 ),并制作了自己的数据集(train.record(〜221Mo),test.record和标签图)以对其进行微调.

但是当我运行它时:

python train.py --logtostderr --pipeline_config_path=/home/username/Documents/Object_Detection/training/faster_rcnn_inception_v2_coco_2018_01_28/pipeline.config --train_dir=/home/username/Documents/Object_Detection/training/

该进程在填充洗牌缓冲区操作期间被终止,看起来像是OOM问题(16Go RAM)...

2018-06-07 12:02:51.107021: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:94] Filling up shuffle buffer (this may take a while): 410 of 2048
Process stopped

是否存在减小随机缓冲区大小的方法?它的大小会受到什么影响?

然后,我添加了一些交换(115Go交换+ 16Go RAM)并完成了shuffle缓冲区的填充操作,但是我的训练在第4步之后占用了所有RAM和交换,而我的train.record大约为221 Mo! >

我已经将这些行添加到了pipeline.config> train_config:

batch_size: 1
batch_queue_capacity: 10
num_batch_queue_threads: 8
prefetch_queue_capacity: 9

,然后将这些添加到我的pipeline.config> train_input_reader中:

queue_capacity: 2
min_after_dequeue: 1
num_readers: 1

遵循此帖子.

我知道我的图像非常大(非常非常大):每个25Mo,但是由于我只拍摄了9张图像来制作train.record(只是为了测试我的安装是否顺利进行),所以它应该不会占用太多内存,对吧? ?

关于为什么要使用这么多RAM的其他想法?

(顺便说一句,我只使用CPU)

解决方案

图像数量不是问题.问题是您输入的图像分辨率(在您的设置.config文件中).您需要在此处更改高度和宽度值(与.config文件类似):

image_resizer {
  # TODO(shlens): Only fixed_shape_resizer is currently supported for NASNet
  # featurization. The reason for this is that nasnet.py only supports
  # inputs with fully known shapes. We need to update nasnet.py to handle
  # shapes not known at compile time.
  fixed_shape_resizer {
    height: 1200
    width: 1200
  }
}

设置为较小的宽度和高度值,您将可以完美地进行训练.

I am using Tensorflow Object Detection API to train my own object detector. I downloaded the faster_rcnn_inception_v2_coco_2018_01_28 from the model zoo (here), and made my own dataset (train.record (~221Mo), test.record and the label map) to fine tune it.

But when I run it :

python train.py --logtostderr --pipeline_config_path=/home/username/Documents/Object_Detection/training/faster_rcnn_inception_v2_coco_2018_01_28/pipeline.config --train_dir=/home/username/Documents/Object_Detection/training/

the process is killed during the filling up shuffle buffer operation, looks like an OOM problem (16Go RAM)...

2018-06-07 12:02:51.107021: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:94] Filling up shuffle buffer (this may take a while): 410 of 2048
Process stopped

Does it exist a way to reduce the shuffle buffer size ? What impact its size ?

Then, I add some swap (115Go swap + 16Go RAM) and the filling up shuffle buffer op finished, but my training took all the RAM and swap after step 4 whereas my train.record is just about 221 Mo !

I already added those lines to my pipeline.config > train_config:

batch_size: 1
batch_queue_capacity: 10
num_batch_queue_threads: 8
prefetch_queue_capacity: 9

and these ones to my pipeline.config > train_input_reader :

queue_capacity: 2
min_after_dequeue: 1
num_readers: 1

following this post.

I know my images are very (very very) large : 25Mo each, but as I only took 9 images to make my train.record (just to test if my installation gone well), it should not be so memory consuming right ?

Any other idea about why it uses so much RAM ?

(BTW I only use CPU)

解决方案

The number of images is not the problem. The problem is your input image resolution(in your setting .config file). You need to change height and width value at here(similar in your .config file):

image_resizer {
  # TODO(shlens): Only fixed_shape_resizer is currently supported for NASNet
  # featurization. The reason for this is that nasnet.py only supports
  # inputs with fully known shapes. We need to update nasnet.py to handle
  # shapes not known at compile time.
  fixed_shape_resizer {
    height: 1200
    width: 1200
  }
}

Set to smaller value width and height and you will able to train perfectly.

这篇关于TensorFlow对象检测API-内存不足的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆