使用 TensorFlow 对象检测 API 确定最大批量大小 [英] Determining max batch size with TensorFlow Object Detection API

查看:108
本文介绍了使用 TensorFlow 对象检测 API 确定最大批量大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TF 对象检测 API 默认占用所有 GPU 内存,因此很难确定我可以进一步增加多少批量大小.通常我会继续增加它,直到出现 CUDA OOM 错误.

TF Object Detection API grabs all GPU memory by default, so it's difficult to tell how much I can further increase my batch size. Typically I just continue to increase it until I get a CUDA OOM error.

另一方面,PyTorch 在默认情况下不会占用所有 GPU 内存,因此很容易看出我还剩下多少百分比可以工作,而无需进行所有试验和错误.

PyTorch on the other hand doesn't grab all GPU memory by default, so it's easy to see what percentage I have left to work with, without all the trial and error.

是否有更好的方法来确定我缺少的 TF 对象检测 API 的批量大小?类似于 model_main.pyallow-growth 标志?

Is there a better way to determine batch size with the TF Object Detection API that I'm missing? Something like an allow-growth flag for model_main.py?

推荐答案

我一直在查看源代码,但没有发现与此相关的 FLAG.

I have been looking in the source code and I have found no FLAG related to this.

但是,在 model_main.py 中 rel="nofollow noreferrer">https://github.com/tensorflow/models/blob/master/research/object_detection/model_main.py您可以找到以下主要功能定义:

But, in the file model_main.py of https://github.com/tensorflow/models/blob/master/research/object_detection/model_main.py you can find the following main function definition:

def main(unused_argv):
  flags.mark_flag_as_required('model_dir')
  flags.mark_flag_as_required('pipeline_config_path')
  config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir)

  train_and_eval_dict = model_lib.create_estimator_and_inputs(
      run_config=config,
...

我们的想法是以类似的方式修改它,例如以下方式:

the idea would be to modify it in a similar way such as the following manner:

config_proto = tf.ConfigProto()
config_proto.gpu_options.allow_growth = True

config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir, session_config=config_proto)

因此,添加 config_proto 并更改 config 但保持所有其他条件相同.

So, adding config_proto and changing config but maintaining all other things equal.

此外,allow_growth 使程序可以根据需要使用尽可能多的 GPU 内存.因此,根据您的 GPU,您最终可能会吃掉所有内存.在这种情况下,您可能需要使用

Also, allow_growth makes the program use as much GPU memory as it needs. So, depending on you GPU you might end up with all memory eaten. In this case you may want to use

config_proto.gpu_options.per_process_gpu_memory_fraction = 0.9

定义要使用的内存比例.

which defines the fraction of memory to use.

希望这有帮助.

如果您不想修改文件,似乎应该打开一个问题,因为我没有看到任何标志.除非标志

If you do not want to modify the file it seems that a issue should be open because I do not see any FLAG. unless the FLAG

flags.DEFINE_string('pipeline_config_path', None, 'Path to pipeline config '
                    'file.')

表示与此相关的内容.但我不这么认为,因为从 model_lib.py 中的内容来看,它与训练、评估和推断配置相关,而不是 GPU 使用配置.

Means something related to this. But I do not think so becuase from what It seems in model_lib.py it is related to train, eval and infer configurations not GPU usage configuration.

这篇关于使用 TensorFlow 对象检测 API 确定最大批量大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆