如何以编程方式为TensorFlow中的所有可用内核构建CUDA JIT高速缓存? [英] How to build CUDA JIT caches for all available kernels in TensorFlow programmatically?

查看:344
本文介绍了如何以编程方式为TensorFlow中的所有可用内核构建CUDA JIT高速缓存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了此问题中讨论的GTX 1080卡和nvidia-docker的首次运行缓慢问题>。

I encountered the "first-run slow-down" problem with GTX 1080 cards and nvidia-docker as discussed in this question.

我使用的是来自其官方pip包和基于nvidia-docker的Ubuntu 16.04基本映像的自定义docker镜像。

I'm using the TensorFlow build from its official pip package and a custom docker image based on nvidia-docker's Ubuntu 16.04 base image.

如何使TensorFlow在Dockerfile中以编程方式加载(和构建JIT缓存)所有注册的CUDA内核? (而不是使用 TF_CUDA_COMPUTE_CAPABILITIES 环境变量手动构建TensorFlow)

How do I make TensorFlow to load (and build JIT caches) all registered CUDA kernels programmatically in a Dockerfile? (rather than manually building TensorFlow using TF_CUDA_COMPUTE_CAPABILITIES environment variable)

推荐答案

似乎是不容易的方法来实现这一点,因为CUDA运行时隐式,lazily编译丢失cubin从给定的内核源

There seems to be no easy way to achieve this since CUDA runtime implicitly, lazily compiles missing cubin from the given kernel sources as discussed here.

通过自己重建TensorFlow解决了这个问题,使用一些帮助脚本来检测当前CUDA / GPU配置并生成所需的TensorFlow配置参数( detect-cuda.py build-tensorflow .sh )。

Solved this problem by rebuilding TensorFlow by myself, with some helper scripts to detect current CUDA/GPU configs and generate required TensorFlow configuration parameters (detect-cuda.py, build-tensorflow.sh).

这篇关于如何以编程方式为TensorFlow中的所有可用内核构建CUDA JIT高速缓存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆