如何在Tensorflow中使用`transform_graph` [英] How to use `transform_graph` in Tensorflow

查看:133
本文介绍了如何在Tensorflow中使用`transform_graph`的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想优化经过冻结训练的Tensorflow模型.但是,我发现optimize_for_inference库不再可用.

I want to optimize my frozen trained Tensorflow model. However, I found out that the optimize_for_inference library is no longer available.

import tensorflow as tf

from tensorflow.python.tools import freeze_graph
from tensorflow.python.tools import optimize_for_inference_lib

input_graph_def = tf.GraphDef()
with tf.gfile.Open("./inference_graph/frozen_model.pb", "rb") as f:
    data = f.read()
    input_graph_def.ParseFromString(data)

output_graph_def = optimize_for_inference_lib.optimize_for_inference(
        input_graph_def,
        ["image_tensor"], ## input  
        ["'detection_boxes, detection_scores, detection_classes, num_detections"], ## outputs
        tf.float32.as_datatype_enum)

f = tf.gfile.FastGFile("./optimized_model.pb", "wb")
f.write(output_graph_def.SerializeToString())

我从 https中找到了transform_graph ://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md#strip_unused_nodes 来优化我的冻结模型.我能够成功地为我的对象检测模型生成一个可以工作的优化模型.生成模型的优化版本的目的是提高模型的推理速度.我在bash(/tensorflow根目录)中输入了此代码:

I found the transform_graph from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md#strip_unused_nodes to optimize my frozen model. I was able to successfully generate a working optimized model for my object detection model. The purpose of generating an optimized version of the model is to improve inference speed of the model. I entered this code in bash (/tensorflow root directory):

bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=/Users/cvsanbuenaventura/Documents/tensorflow_fastlog/models/research/object_detection/inference_graph/frozen_inference_graph.pb \
--out_graph=/Users/cvsanbuenaventura/Documents/tensorflow_fastlog/models/research/object_detection/inference_graph/optimized_inference_graph-transform_graph-manyoutputs-planA2-v2.pb \
--inputs='image_tensor' \
--outputs='detection_boxes, detection_scores, detection_classes, num_detections' \
--transforms='fold_batch_norms
fold_old_batch_norms
fold_constants(ignore_errors=true)'

所以我的问题是:

  1. 转换的作用是什么? fold_batch_norms, fold_old_batch_norms, fold_constants(ignore_errors=true)
  2. 我能够使用上面的三个转换成功生成优化的模型.但是还有其他转换(例如strip_unused_nodes(type=float, shape="1,299,299,3")).这是做什么的?我应该在这里放什么形状?
  3. optimize_for_inference库是否不再存在?
  1. What do the transforms do? fold_batch_norms, fold_old_batch_norms, fold_constants(ignore_errors=true)
  2. I was able to successfully generate an optimized model using the three transforms above. But there are other transforms (e.g. strip_unused_nodes(type=float, shape="1,299,299,3")). What does this do? And what shape should I put here?
  3. Does the optimize_for_inference library not exist anymore?

推荐答案

我有点想和你一样

  1. 关于说明,找到了此演示文稿,其中的详细信息太多了;幻灯片14和15在SimplifyGraph()上似乎具有您想知道的内容 https://web.stanford7edu/class/cs245/slides/TFGraphOptimizationsStanford.pdf

  1. About explanations, found this presentation, which details a bit too much; slides 14 and 15 seem to have what you want to know, on SimplifyGraph() https://web.stanford7edu/class/cs245/slides/TFGraphOptimizationsStanford.pdf

这似乎是"1,299,299,3"对应于SSD-300x300型号,因此我想如果存在与强制将数据调整为该尺寸有关的内容.我已经读到,优化的想法是删除完整训练所需的节点,而不是推理所需的节点. 就我而言,我使用的是1920x1080 FRCNN模型,所以我想我必须删除"1,1080,1920,3".

This seems that the "1,299,299,3" corresponds to an SSD-300x300 model, So I guess that if there is something related to forcing data to be resized to that. I've read that the idea of optimization is removing nodes required for full training but not for inference. In my case, I'm using a 1920x1080 FRCNN model, so I guess I'll have to remove a "1,1080,1920,3".

很有可能...必须检查TensorFlow团队的变更日志.

Most likely not... would have to check the changelogs of TensorFlow team.

  1. 终于做了我的测试.似乎使用Faster-RCNN(可能还有R-FCN)使用优化推理"模型在GPU推理上没有任何好处(我的参考卡是GTX Titan X Maxwell,但我也有AGX Xavier进行测试).尝试使用此指令进行量化"模型:

  1. Made my tests finally. It seems that with Faster-RCNN (and possibly R-FCN) I don't get any benefits in inference on GPU with an 'optimized for inference' model (my reference card is a GTX Titan X Maxwell, but I also have an AGX Xavier to test). Tried a 'quantized' model with this instruction:

〜/build/tensorflow/tf_1.12.3-cpu/bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph ='model.cas.f01-v2_aug_frcnn-1920-1080-dia.pb' --out_graph ='opt-for-inf/opt_2q_model.cas.f01-v2_aug_frcnn-1920-1080-dia.pb' --inputs ="image_tensor" -输出="detection_boxes,detection_scores,detection_classes,num_detections" --transforms ='add_default_attributes strip_unused_nodes(type = float,shape ="1,1080,1920,3") remove_nodes(op = Identity,op = CheckNumerics) fold_constants(ignore_errors = true)fold_batch_norms fold_old_batch_norms merge_duplicate_nodesquantize_weights sort_by_execution_order'

~/build/tensorflow/tf_1.12.3-cpu/bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph='model.cas.f01-v2_aug_frcnn-1920-1080-dia.pb' --out_graph='opt-for-inf/opt_2q_model.cas.f01-v2_aug_frcnn-1920-1080-dia.pb' --inputs="image_tensor" -- outputs="detection_boxes,detection_scores,detection_classes,num_detections" --transforms='add_default_attributes strip_unused_nodes(type=float, shape="1,1080,1920,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms merge_duplicate_nodes quantize_weights sort_by_execution_order'

它并没有使推理时间更好(例如,在Xavier中从每次推理的1.2秒变为0.8左右).添加"quantize_nodes"使我在模型各层上不匹配,这使其无法使用. 也许对于这种拓扑而言,它的工作方式有所不同,这需要我进行更多探索,以了解如何优化此模型以进行推理.不过,它似乎适用于SSD.将测试我自己的结果并发布结果.

And it did not make any better the inference times (let's say, going in the Xavier from 1.2 secs per inference to 0.8 or so). Adding 'quantize_nodes' gave me a mismatch on the layers of the model, which made it infeasible to use. Maybe it works differently for this topology, requiring me to explore more to see how to optimize this model for inference. It seems to work for SSDs, though; will test my own, and publish results.

  1. 我所知道的是,如果要进行培训,您至少可以使用Volta架构的GPU(Titan-V或Tesla V100)或RTX卡,则可以使用env var并进行培训模型的混合数据类型(可能时为FP16,有些则为FP32).如果您确实不需要精度,那将为推理提供更好的模型. 那将取决于用例:对于医学图像,可能会达到最高的精度.车辆左右的物体检测,我想您可能会牺牲速度的精度. 带有nVidia-CUDA的混合精度培训: https ://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#tensorflow-amp

  1. The thing that I know, is that if for training you have access to a Volta architecture GPU at least (Titan-V, or Tesla V100), or RTX cards, you can use an env var and train on mixed datatypes the model (FP16 when possible, with some in FP32). That makes a better model for inference, if you really don't need the precision. That would depend on the use case: for medical images, highest precision possible. Object detection of vehicles or so, I guess you can compromise precision for speed. Mixed precision training w/nVidia-CUDA: https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#tensorflow-amp

我的另一种方法是尝试将模型转换为TF-Lite,并在那里查看如何使用推理.仍然在我的待办事项中.

My other approach, would be trying to convert the model to TF-Lite, and see how to use inference there. It's still on my backlog.

我使用bazel v0.19.x编译了tensorflow.

I compiled tensorflow with bazel v0.19.x.

这篇关于如何在Tensorflow中使用`transform_graph`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆