运行 Tensorflow 对象检测 API 时出现 RAM 错误 [英] RAM Error running the Tensorflow Object Detection API

查看:48
本文介绍了运行 Tensorflow 对象检测 API 时出现 RAM 错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

安装 Tensorflow 对象检测 API 并按照所有说明进行操作后,我开始使用自己的数据集进行训练.很快,该程序开始使用所有 RAM 并终止该进程.我已经阅读了有关此主题的所有可用帖子,但似乎没有人有答案.这是试图找出导致此问题的原因的另一种尝试.

After installing the Tensorflow Object Detection API and following all the instructions, I started training with my own dataset. Very quickly the program began to use all the RAM and the process was killed. I have read all the post available on this subject and nobody seems to have an answer. This is another attempt at trying to figure out what is the cause of this problem.

计算机规格:

  1. 12 GB 内存
  2. Ubuntu 14.04 LTS
  3. tensorflow-gpu
  4. NVIDIA GTX 1070 - 8.0 GB

日志是:

INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
Killed

从其他人写的来看,这绝对是内存使用问题.任何帮助都会受到欢迎.

From what others are writing it is definitely a memory usage problem. Any help is well received.

推荐答案

您可以通过调整配置文件来做到这一点.

You can do this by tweaking the config files.

事实证明大部分内存被输入队列.但是由于数据是超快的tfrecord格式,所以不需要准备那么多的例子.

It turns out most of the RAM is consumed by the input queues. But since the data is in the super-fast tfrecord format, there is not need to keep that many examples prepared.

根据您使用的模型(SSD、Faster R-CNN),这些设置会有所不同,因为 SSD 可以使用比 FRCNN 更高的批次(基本上使用 1 的批次).

Depending on what model you use (SSD, Faster R-CNN), these settings would vary, since SSD can use higher batches that FRCNN (which basically uses batches of 1).

在您的配置文件中识别或添加以下内容,并使用队列的数字.

Identify or add the following in your config files and play with the numbers for the queues.

train_config: {
  # ... other settings  
  batch_size: 1 # this is for FRCNN
  batch_queue_capacity: 10
  num_batch_queue_threads: 4
  prefetch_queue_capacity: 5
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "/path/to/train.tfrecord"
  }
  label_map_path: "/path/to/label/map.pbtxt"
  queue_capacity: 400
  min_after_dequeue: 200
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/path/to/eval.tfrecord"
  }
  label_map_path: "/path/to/label/map.pbtxt"
  shuffle: true
  queue_capacity: 20
  min_after_dequeue: 10
  num_readers: 1
}

这些和其他设置是通过检查 object_detection/protos 中的 .proto 文件发现的,这些文件描述了模型的所有设置.

These and other settings were found out by inspecting the .proto files in object_detection/protos which describe all settings of the models.

这篇关于运行 Tensorflow 对象检测 API 时出现 RAM 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆