如何使用fp16和fp32训练模型进行推理? [英] How to inference using fp16 with a fp32 trained model?

查看:1590
本文介绍了如何使用fp16和fp32训练模型进行推理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想推断一个使用fp16的fp32模型,以验证半精度结果.加载检查点后,可以将这些参数转换为float16,然后如何在会话中使用这些fp16参数?

I want to inference with a fp32 model using fp16 to verify the half precision results. After loading checkpoint, the params can be converted to float16, then how to use these fp16 params in session?

reader = tf.train.NewCheckpointReader(model_file)
var_to_map = reader.get_variable_to_dtype_map()

for key, val in var_to_map.items():
    tsr = reader.get_tensor(key)
    val_f16 = tf.cast(tsr, tf.float16)

# sess.restore() ???

推荐答案

我找到了一种实现它的方法.

I found a method to realize it.

  1. 使用 tf.train.NewCheckpointReader()加载检查点,然后读取参数并将其转换为float16类型.
  2. 使用float16读取参数初始化图层
  1. load checkpoint with tf.train.NewCheckpointReader(), then read params and convert them to float16 type.
  2. use float16 read params to initialize layers

    weight_name = scope_name + '/' + get_layer_str() + '/' + 'weight'
    initw = inits[weight_name]
    weight = tf.get_variable('weight', dtype=initw.dtype, initializer=initw)
    out = tf.nn.conv2d(self.get_output(), weight, strides=[1, stride, stride, 1], padding='SAME')

  1. 运行图形

我的GPU是没有张量核心的GTX1080,但是使用fp16的推理比使用fp32的推理快20%-30%,我不明白原因,并且使用了哪个硬件单元"计算fp16,fp32的传统单位是吗?

My GPU was GTX1080 without tensor core, but inference with fp16 is faster than with fp32 by 20%-30%, I don't understand the reason, and which "hardware units" was used to calc fp16, is traditional units for fp32?

这篇关于如何使用fp16和fp32训练模型进行推理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆