TensorFlow v1.10+ 使用不同的设备放置加载 SavedModel 还是手动设置动态设备放置? [英] TensorFlow v1.10+ load SavedModel with different device placement or manually set dynamic device placement?

查看:23
本文介绍了TensorFlow v1.10+ 使用不同的设备放置加载 SavedModel 还是手动设置动态设备放置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,在使用 GPU 的 TensorFlow 指南中,有一部分是关于使用多个 GPU以多塔方式":

So in TensorFlow's guide for using GPUs there is a part about using multiple GPUs in a "multi-tower fashion":

...
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d): # <---- manual device placement
...

看到这一点,人们可能会想在自定义 Estimator 中利用这种风格进行多 GPU 训练,以向模型表明它可以有效地分布在多个 GPU 上.

Seeing this, one might be tempted to leverage this style for multiple GPU training in a custom Estimator to indicate to the model that it can be distributed across multiple GPUs efficiently.

据我所知,如果没有手动设备放置,TensorFlow 没有某种形式的最佳设备映射(如果您安装了 GPU 版本并且有可用的 GPU,则可能希望通过 CPU 使用它).那么你还有什么选择?

To my knowledge, if manual device placement is absent TensorFlow does not have some form of optimal device mapping (expect perhaps if you have the GPU version installed and a GPU is available, using it over the CPU). So what other choice do you have?

无论如何,您继续训练您的估算器并通过 estimator.export_savedmodel(...) 将其导出到 SavedModel 并希望使用此 SavedModel 稍后......也许在另一台机器上,可能没有与训练模型的设备一样多的 GPU(或者可能没有 GPU)

Anyway, you carry on with training your estimator and export it to a SavedModel via estimator.export_savedmodel(...) and wish to use this SavedModel later... perhaps on a different machine, one which may not have as many GPUs as the device on which the model was trained (or maybe no GPUs)

所以当你运行时

from tensorflow.contrib import predictor
predict_fn = predictor.from_saved_model(model_dir)

你得到

Cannot assign a device for operation <OP-NAME>. Operation was 
explicitly assigned to <DEVICE-NAME> but available devices are 
[<AVAILABLE-DEVICE-0>,...]

较旧的 S.O.帖子表明改变设备放置是不可能的......但希望随着时间的推移事情发生了变化.

An older S.O. Post suggests that changing device placement was not possible... but hopefully over time things have changed.

因此我的问题是:

  1. 加载 SavedModel 时,我可以更改设备放置以适合加载它的设备.例如.如果我用 6 个 GPU 训练一个模型,而一个朋友想用他们的 e-GPU 在家里运行它,他们可以通过 '/device:GPU 设置 '/device:GPU:1':5''/device:GPU:0'?

  1. when loading a SavedModel can I change the device placement to be appropriate for the device it is loaded on. E.g. if I train a model with 6 GPUs and a friend wants to run it at home with their e-GPU, can they set '/device:GPU:1' through '/device:GPU:5' to '/device:GPU:0'?

如果 1 是不可能的,那么在自定义 Estimatormodel_fn 中是否有一种(无痛的)方法, 指定如何一般分布图?

if 1 is not possible, is there a (painless) way for me, in the custom Estimator's model_fn, to specify how to generically distribute a graph?

例如

with tf.device('available-gpu-3')

其中 available-gpu-3 如果有三个或更多 GPU,则为第三个可用 GPU,否则为第二个或第一个可用 GPU,如果没有 GPU,则为 CPU

where available-gpu-3 is the third available GPU if there are three or more GPUs, otherwise the second or first available GPU, and if no GPU it is CPU

这很重要,因为如果有一台共享机器正在训练两个模型,比如在 '/device:GPU:0' 上训练一个模型,那么另一个模型在 GPU 1 和 2 上明确训练... 所以在另一台 2 GPU 机器上,GPU 2 将不可用....

This matters because if there is a shared machine with is training two models, say one model on '/device:GPU:0' then the other model is trained explicitly on GPUs 1 and 2... so on another 2 GPU machine, GPU 2 will not be available....

推荐答案

我最近正在研究这个主题,据我所知,只有在导出原始模型时清除所有设备,您的问题 1 才能起作用tensorflow 代码,带有标志 clear_devices=True.

I am doing some research on this topic recently and to my knowledge, your question 1 can work only if you clear all devices when you export the model in the original tensorflow code, with flag clear_devices=True.

在我自己的代码中,它看起来像

In my own code, it looks like

builder = tf.saved_model.builder.SavedModelBuilder('osvos_saved')
builder.add_meta_graph_and_variables(sess, ['serve'], clear_devices=True)
builder.save()

如果你只有一个导出的模型,似乎是不可能的.您可以参考这个问题.

If you only have a exported model, seems not possible. You can refer to this issue.

我目前正试图找到一种方法来解决这个问题,如我的 stackoverflow 问题.希望解决方法可以帮助您.

I'm currently trying to find a way to fix this, as stated in my stackoverflow question. Hope the workaround can help you.

这篇关于TensorFlow v1.10+ 使用不同的设备放置加载 SavedModel 还是手动设置动态设备放置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆