理解 tf.contrib.lite.TFLiteConverter 量化参数 [英] Understanding tf.contrib.lite.TFLiteConverter quantization parameters

查看:24
本文介绍了理解 tf.contrib.lite.TFLiteConverter 量化参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在将 tensorflow 模型转换为 tflite 模型时尝试使用 UINT8 量化:

I'm trying to use UINT8 quantization while converting tensorflow model to tflite model:

如果使用 post_training_quantize = True,模型大小比原始 fp32 模型低 x4,所以我假设模型权重是 uint8,但是当我加载模型并通过 interpreter_aligner 获取输入类型时.get_input_details()[0]['dtype'] 它是 float32.量化模型的输出与原始模型大致相同.

If use post_training_quantize = True, model size is x4 lower then original fp32 model, so I assume that model weights are uint8, but when I load model and get input type via interpreter_aligner.get_input_details()[0]['dtype'] it's float32. Outputs of the quantized model are about the same as original model.

converter = tf.contrib.lite.TFLiteConverter.from_frozen_graph(
        graph_def_file='tflite-models/tf_model.pb',
        input_arrays=input_node_names,
        output_arrays=output_node_names)
converter.post_training_quantize = True
tflite_model = converter.convert()

转换模型的输入/输出:

Input/output of converted model:

print(interpreter_aligner.get_input_details())
print(interpreter_aligner.get_output_details())
[{'name': 'input_1_1', 'index': 47, 'shape': array([  1, 128, 128,   3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0)}]
[{'name': 'global_average_pooling2d_1_1/Mean', 'index': 45, 'shape': array([  1, 156], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0)}]

另一种选择是明确指定更多参数:模型大小比原来的 fp32 模型小 4 倍,模型输入类型是 uint8,但模型输出更像垃圾.

Another option is to specify more parameters explicitly: Model size is x4 lower then original fp32 model, model input type is uint8, but model outputs are more like garbage.

converter = tf.contrib.lite.TFLiteConverter.from_frozen_graph(
        graph_def_file='tflite-models/tf_model.pb',
        input_arrays=input_node_names,
        output_arrays=output_node_names)
converter.post_training_quantize = True
converter.inference_type = tf.contrib.lite.constants.QUANTIZED_UINT8
converter.quantized_input_stats = {input_node_names[0]: (0.0, 255.0)}  # (mean, stddev)
converter.default_ranges_stats = (-100, +100)
tflite_model = converter.convert()

转换模型的输入/输出:

Input/output of converted model:

[{'name': 'input_1_1', 'index': 47, 'shape': array([  1, 128, 128,   3], dtype=int32), 'dtype': <class 'numpy.uint8'>, 'quantization': (0.003921568859368563, 0)}]
[{'name': 'global_average_pooling2d_1_1/Mean', 'index': 45, 'shape': array([  1, 156], dtype=int32), 'dtype': <class 'numpy.uint8'>, 'quantization': (0.7843137383460999, 128)}]

所以我的问题是:

  1. 当仅设置 post_training_quantize = True 时会发生什么?即为什么第一个案例可以正常工作,但第二个案例不行.
  2. 如何估计第二种情况的均值、标准差和范围参数?
  3. 看起来在第二种情况下模型推理速度更快,是否取决于模型输入为 uint8 的事实?
  4. 什么意思 'quantization': (0.0, 0) 在第一种情况下和 'quantization': (0.003921568859368563, 0),'quantization': (0.7843137383460999, 128) 在第二种情况下?
  5. 什么是converter.default_ranges_stats?
  1. What is happenning when only post_training_quantize = True is set? i.e. why 1st case work fine, but second don't.
  2. How to estimate mean, std and range parameters for second case?
  3. Looks like in second case model inference is faster, is it depend on the fact that model input is uint8?
  4. What means 'quantization': (0.0, 0) in 1st case and 'quantization': (0.003921568859368563, 0),'quantization': (0.7843137383460999, 128) in 2nd case?
  5. What is converter.default_ranges_stats ?

更新:

找到问题 4 的答案interpreter.get_input_details() 中的量化"是什么意思?

推荐答案

仅设置 post_training_quantize = True 时会发生什么?即为什么第一种情况可以正常工作,但第二种情况则不行.

What is happenning when only post_training_quantize = True is set? i.e. why 1st case work fine, but second don't.

在 TF 1.14 中,这似乎只是量化存储在磁盘上的权重,在 .tflite 文件中.这本身不会将推理模式设置为量化推理.

In TF 1.14, this seems to just quantize the weights stored on disk, in the .tflite file. This does not, by itself, set the inference mode to quantized inference.

即,您可以拥有一个推理类型为 float32 的 tflite 模型,但为了减小磁盘大小,模型权重被量化(使用 post_training_quantize=True),并在运行时更快地加载模型.

i.e., You can have a tflite model which has inference type float32 but the model weights are quantized (using post_training_quantize=True) for the sake of lower disk size, and faster loading of the model at runtime.

如何估计第二种情况的均值、标准差和范围参数?

How to estimate mean, std and range parameters for second case?

文档让很多人感到困惑.让我解释一下我经过一些研究得出的结论:

The documentation is confusing to many. Let me explain what I concluded after some research :

  1. 不幸的是,量化参数/统计数据在整个 TF 库和文档中有 3 个等效形式/表示:
    • A) (mean, std_dev)
    • B) (zero_point, scale)
    • C) (min,max)
  1. Unfortunately quantization parameters/stats has 3 equivalent forms/representations across the TF library and documentation :
    • A) (mean, std_dev)
    • B) (zero_point, scale)
    • C) (min,max)
  • std_dev = 1.0/比例
  • mean = zero_point
  • mean = 255.0*min/(min - max)
  • std_dev = 255.0/(max - min)
  • 说明:量化统计是用于将范围 (0,255) 映射到任意范围的参数,您可以从 2 个等式开始:min/std_dev + mean = 0max/std_dev + mean = 255 ,然后按照数学公式达到上面的转换公式
  • mean = 255.0*min / (min - max)
  • std_dev = 255.0 / (max - min)
  • Explanation: quantization stats are parameters used for mapping the range (0,255) to an arbitrary range, you can start from the 2 equations: min / std_dev + mean = 0 and max / std_dev + mean = 255, then follow the math to reach the above conversion formulas
  • min = - mean * std_dev
  • max = (255 - mean) * std_dev

回答你的问题:,如果你的输入图像有:

To answer your question: , if your input image has :

  • 范围 (0,255) 然后 mean = 0, std_dev = 1
  • range (-1,1) 然后 mean = 127.5, std_dev = 127.5
  • range (0,1) 然后 mean = 0, std_dev = 255

看起来在第二种情况下模型推断更快,是否取决于模型输入是 uint8 的事实?

Looks like in second case model inference is faster, is it depend on the fact that model input is uint8?

是的,可能.然而,量化模型通常较慢,除非您使用特定硬件的矢量化指令.TFLite 经过优化,可以运行那些针对 ARM 处理器的专用指令.从 TF 1.14 或 1.15 开始,如果您在本地机器 x86 Intel 或 AMD 上运行它,那么如果量化模型运行得更快,我会感到惊讶.[更新:在 TFLite 的路线图中添加对 x86 向量化指令的一流支持以使量化推理比浮点更快]

Yes, possibly. However quantized models are typically slower unless you make use of vectorized instructions of your specific hardware. TFLite is optimized to run those specialized instruction for ARM processors. As of TF 1.14 or 1.15 if you are running this on your local machine x86 Intel or AMD, then I'd be surprised if the quantized model runs faster. [Update: It's on TFLite's roadmap to add first-class support for x86 vectorized instructions to make quantized inference faster than float]

'quantization': (0.0, 0) 在第一种情况下是什么意思,'quantization': (0.003921568859368563, 0),'quantization': (0.7843137383460999, 128) 在第二种情况下是什么意思?

What means 'quantization': (0.0, 0) in 1st case and 'quantization': (0.003921568859368563, 0),'quantization': (0.7843137383460999, 128) in 2nd case?

这里的格式是quantization: (scale, zero_point)

在您的第一种情况下,您只激活了 post_training_quantize=True,这不会使模型运行量化推理,因此无需将输入或输出从 float 转换为 uint8.因此这里的量化统计基本上是null,表示为(0,0).

In your first case, you only activated post_training_quantize=True, and this doesn't make the model run quantized inference, so there is no need to transform the inputs or the outputs from float to uint8. Thus quantization stats here are essentially null, which is represented as (0,0).

在第二种情况下,您通过提供 inference_type = tf.contrib.lite.constants.QUANTIZED_UINT8 来激活量化推理.因此,您有输入和输出的量化参数,在进入模型的途中将浮点输入转换为 uint8,在离开时将 uint8 输出转换为浮点输出.

In the second case, you activated quantized inference by providing inference_type = tf.contrib.lite.constants.QUANTIZED_UINT8. So you have quantization parameters for both input and output, which are needed to transform your float input to uint8 on the way in to the model, and the uint8 output to a float output on the way out.

  • 在输入时,进行转换:uint8_array = (float_array/std_dev) + mean
  • 在输出时,进行转换:float_array = (uint8_array.astype(np.float32) - mean) * std_dev
  • 注意 .astype(float32) 这是在 python 中获得正确计算所必需的
  • 请注意,其他文本可能使用 scale 而不是 std_dev,因此除法将变为乘法,反之亦然.
  • At input, do the transformation: uint8_array = (float_array / std_dev) + mean
  • At output, do the transformation: float_array = (uint8_array.astype(np.float32) - mean) * std_dev
  • Note .astype(float32) this is necessary in python to get correct calculation
  • Note that other texts may use scale instead of std_dev so the divisions will become multiplications and vice versa.

这里另一个令人困惑的事情是,即使在转换过程中您指定 quantization_stats = (mean, std_dev)get_output_details 将返回 quantization: (scale,zero_point),不仅形式不同(scale vs std_dev)而且顺序不同!

Another confusing thing here is that, even though during conversion you specify quantization_stats = (mean, std_dev), the get_output_details will return quantization: (scale, zero_point), not just the form is different (scale vs std_dev) but also the order is different!

现在要了解您为输入和输出获得的这些量化参数值,让我们使用上面的公式来推导出输入和输出的实际值 ((min,max)) 的范围.使用上面的公式我们得到:

Now to understand these quantization parameter values you got for the input and output, let's use the formulas above to deduce the range of real values ((min,max)) of your inputs and outputs. Using the above formulas we get :

  • 输入范围:min = 0, max=1(是您通过提供quantized_input_stats = {input_node_names[0]: (0.0, 255.0)} # (mean,stddev) )
  • 输出范围:min = -100.39, max=99.6
  • Input range : min = 0, max=1 (it is you who specified this by providing quantized_input_stats = {input_node_names[0]: (0.0, 255.0)} # (mean, stddev) )
  • Output range: min = -100.39, max=99.6

这篇关于理解 tf.contrib.lite.TFLiteConverter 量化参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆