如何以编程方式为TensorFlow中的所有可用内核构建CUDA JIT高速缓存? [英] How to build CUDA JIT caches for all available kernels in TensorFlow programmatically?
问题描述
我遇到了此问题中讨论的GTX 1080卡和nvidia-docker的首次运行缓慢问题>。
I encountered the "first-run slow-down" problem with GTX 1080 cards and nvidia-docker as discussed in this question.
我使用的是来自其官方pip包和基于nvidia-docker的Ubuntu 16.04基本映像的自定义docker镜像。
I'm using the TensorFlow build from its official pip package and a custom docker image based on nvidia-docker's Ubuntu 16.04 base image.
如何使TensorFlow在Dockerfile中以编程方式加载(和构建JIT缓存)所有注册的CUDA内核? (而不是使用 TF_CUDA_COMPUTE_CAPABILITIES
环境变量手动构建TensorFlow)
How do I make TensorFlow to load (and build JIT caches) all registered CUDA kernels programmatically in a Dockerfile? (rather than manually building TensorFlow using TF_CUDA_COMPUTE_CAPABILITIES
environment variable)
推荐答案
似乎是不容易的方法来实现这一点,因为CUDA运行时隐式,lazily编译丢失cubin从给定的内核源。
There seems to be no easy way to achieve this since CUDA runtime implicitly, lazily compiles missing cubin from the given kernel sources as discussed here.
通过自己重建TensorFlow解决了这个问题,使用一些帮助脚本来检测当前CUDA / GPU配置并生成所需的TensorFlow配置参数( detect-cuda.py , build-tensorflow .sh )。
Solved this problem by rebuilding TensorFlow by myself, with some helper scripts to detect current CUDA/GPU configs and generate required TensorFlow configuration parameters (detect-cuda.py, build-tensorflow.sh).
这篇关于如何以编程方式为TensorFlow中的所有可用内核构建CUDA JIT高速缓存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!