为什么 TensorFlow Lite 在桌面上比 TensorFlow 慢? [英] Why is TensorFlow Lite slower than TensorFlow on desktop?

查看:41
本文介绍了为什么 TensorFlow Lite 在桌面上比 TensorFlow 慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在研究单图像超分辨率,并且已经设法冻结现有的检查点文件并将其转换为 tensorflow lite.但是,在使用 .tflite 文件进行推理时,对一张图像进行上采样所需的时间至少是使用 .ckpt 文件恢复模型时的 4 倍.

I'm currently working on Single Image Superresolution and I've managed to freeze an existing checkpoint file and convert it into tensorflow lite. However, when performing inference using the .tflite file, the time taken to upsample one image is at least 4 times that when restoring the model using the .ckpt file.

使用 .ckpt 文件的推理是使用 session.run() 完成的,而使用 .tflite 文件的推理是使用 interpreter.invoke() 完成的.这两项操作都是在典型 PC 上运行的 Ubuntu 18 VM 上完成的.

Inference using the .ckpt file is done using session.run(), while inference using the .tflite file is done using interpreter.invoke(). Both operations were done on a Ubuntu 18 VM running on a typical PC.

为了进一步了解该问题,我所做的是在单独的终端中运行 top 以查看执行任一操作时的 CPU 利用率..ckpt 文件的利用率达到 270%,但 .tflite 文件的利用率保持在 100% 左右.

What I did to find out more about the issue is to run top in a seperate terminal to see the CPU utilization rate when either operations are performed. Utilization rate hits 270% with the .ckpt file, but stays at around 100% with the .tflite file.

interpreter.set_tensor(input_details[0]['index'], input_image_reshaped)
interpreter.set_tensor(input_details[1]['index'], input_bicubic_image_reshaped)
start = time.time()
interpreter.invoke()
end = time.time()

对比

y = self.sess.run(self.y_, feed_dict={self.x: image.reshape(1, image.shape[0], image.shape[1], ch), self.x2: bicubic_image.reshape(1, self.scale * image.shape[0], self.scale * image.shape[1], ch), self.dropout: 1.0, self.is_training: 0})

一种假设是 tensorflow lite 没有针对多线程进行配置,另一种假设是 tensorflow lite 针对 ARM 处理器(而不是我的计算机运行的 Intel 处理器)进行了优化,因此速度较慢.但是,我不能确定,我也不知道如何追查问题的根源 - 希望有人对此有更多了解?

One hypothesis is that tensorflow lite is not configured for multithreading, and another is that tensorflow lite is optimized for ARM processors (rather than an Intel one that my computer runs on) and thus it is slower. However, I cannot tell for sure and neither do I know how to trace the root of the issue - hopefully someone out there will be more knowledgeable about this?

推荐答案

是的,当前的 TensorFlow Lite op 内核针对 ARM 处理器进行了优化(使用 NEON 指令集).如果 SSE 可用,它将尝试使用 NEON_2_SSE 使 NEON 调用适应 SSE,因此它应该仍然使用某种 SIMD 运行.但是,我们并没有花太多精力来优化此代码路径.

Yes, the current TensorFlow Lite op kernels are optimized for ARM processor (using NEON instruction set). If SSE is available, it will try to use NEON_2_SSE to adapt NEON calls to SSE, so it should be still running with some sort of SIMD. However we didn't put much effort to optimize this code path.

关于线程数.C++ 中有一个 SetNumThreads 函数但它没有在 Python API 中公开(还).如果未设置,底层实现可能会尝试探测可用内核的数量.如果你自己构建代码,你可以尝试改变值,看看是否影响结果.

Regarding number of threads. There is a SetNumThreads function in C++ API, but it's not exposed in Python API (yet). When it's not set, the underlying implementation may try to probe number of available cores. If you build the code by yourself, you can try to change the value and see if it affects the result.

希望这些有帮助.

这篇关于为什么 TensorFlow Lite 在桌面上比 TensorFlow 慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆