使用 TensorRT (TF-TRT) 将 tensorflow saved_model 从 float32 转换为 float16 的问题 [英] Problem converting tensorflow saved_model from float32 to float16 using TensorRT (TF-TRT)

查看:77
本文介绍了使用 TensorRT (TF-TRT) 将 tensorflow saved_model 从 float32 转换为 float16 的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 tensorflow(1.14 版)float32 SavedModel,我想将其转换为 float16.根据 https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usage-example ,我可以将FP16"传递给 precision_mode 以将模型转换为 fp16.但是转换后的模型,在检查tensorboard后,仍然是fp32:net paramters是DT_FLOAT而不是DT_HALF.并且转换后的模型大小与转换前的模型相似.(这里我假设,如果转换成功,模型将变成一半,因为参数被减半).

I have a tensorflow (version 1.14) float32 SavedModel that I want to convert to float16. According to https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usage-example , I could pass "FP16" to precision_mode to convert the model to fp16. But the converted model, after checking the tensorboard, is still fp32: net paramters are DT_FLOAT instead of DT_HALF. And the size of the converted model is similar to the model before conversion. (Here I assume that, if converted successfully, the model will become half as large since paramters are cut in half).

import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import os

FLAGS = tf.flags.FLAGS
tf.flags.DEFINE_string('saved_model_dir', '', 'Input saved model dir.')
tf.flags.DEFINE_bool('use_float16', False,
                     'Whether we want to quantize it to float16.')
tf.flags.DEFINE_string('output_dir', '', 'Output saved model dir.')


def main(argv):
    del argv  # Unused.
    saved_model_dir = FLAGS.saved_model_dir
    output_dir = FLAGS.output_dir
    use_float16 = FLAGS.use_float16

    precision_mode = "FP16" if use_float16 else "FP32"
    converter = trt.TrtGraphConverter(input_saved_model_dir=saved_model_dir,
                                      precision_mode=precision_mode)
    converter.convert()
    converter.save(output_dir)


if __name__ == '__main__':
    tf.app.run(main)

非常欢迎任何意见或建议!谢谢

Any advices or suggestions are very welcome! Thanks

推荐答案

您为 TF-TRT 正确指定了精度模式.但是在 TensorBoard 上检查网络参数并不会揭示 TensorRT 引擎如何在内部存储转换后的模型模型的参数.

You specify the precision mode correctly for TF-TRT. But checking the network parameters on TensorBoard will not reveal how the the TensorRT engine is internally storing the parameters of the converted model model.

有几点需要考虑:

  • 在 TF-TRT 中,模型转换为 TensorRT 后,我们仍然保留原始的 Tensorflow 权重.这样做是为了在 TensorRT 路径由于某种原因失败时提供对原生 TensorFlow 执行的回退.这样,saved_model.pb 文件将至少与原始模型文件一样大.

  • In TF-TRT we still keep the original Tensorflow weights after the model is converted to TensorRT. This is done so to provide a fallback to native TensorFlow execution if for some reason the TensorRT path would fail. This way the saved_model.pb file will be at least as large as the original model file.

TensorRT 引擎包含转换节点权重的副本.在 FP16 模式下,TensorRT 引擎大小将大约是原始模型大小的一半(假设大部分节点都进行了转换).这将添加到原始模型大小,因此 saved_model.pb 将是原始模型大小的 1.5 倍.

The TensorRT engine contains a copy of the weights of the converted nodes. In FP16 mode, the TensorRT engine size will be roughly half the size of the original model (assuming that most of the nodes are converted). This is added to the original model size, so saved_model.pb would be 1.5x the size of the original model.

如果我们设置 is_dynamic_op=True(TF2 中的默认值),那么 TensorRT 引擎的创建会延迟到第一次推理调用.如果我们在运行第一次推理之前保存模型,那么模型中只会添加一个占位符 TRTEngineOp,这并不会真正增加模型大小.

If we set is_dynamic_op=True (default in TF2), then the TensorRT engine creation is delayed until the first inference call. If we save the model before we run the first inference, then only a placeholder TRTEngineOp is added to the model, which does not really increase the model size.

在 TF2 中,TensorRT 引擎被序列化为 Assets 目录中的单独文件.

In TF2 the TensorRT engines are serialized into separate files inside the Assets directory.

这篇关于使用 TensorRT (TF-TRT) 将 tensorflow saved_model 从 float32 转换为 float16 的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆