Tensorflow 首次在具有 5.0 计算能力的显卡上运行需要 1 分钟以上 [英] Tensorflow takes >1 min on first run on video card with 5.0 compute capability

查看:20
本文介绍了Tensorflow 首次在具有 5.0 计算能力的显卡上运行需要 1 分钟以上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为 python3 运行 tensorflow 0.8.0(pip 安装),以及以下文件 test.py:

I'm running tensorflow 0.8.0 for python3 (pip installation), and the following file test.py:

import tensorflow as tf                                                         

a = tf.convert_to_tensor([1], dtype=tf.int32)                               
b = tf.to_float(a)                                                              

with tf.Session():                                                              
    b.eval() 

... 运行时间超过一分钟:

... takes more than a minute to run:

$time python3 test.py 
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 860M
major: 5 minor: 0 memoryClockRate (GHz) 1.0195
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.61GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 860M, pci bus id: 0000:01:00.0)

real    1m6.985s
user    1m6.700s
sys 0m1.480s

我应该提到其他 tensorflow 程序似乎工作正常,例如

I should mention other tensorflow programs seem to work fine, e.g.

$time python3 -m tensorflow.models.image.mnist.convolutional

不到 4 分钟.

$cat /usr/local/cuda/version.txt 
CUDA Version 7.5.18

$ls /usr/local/cuda/lib64/libcudnn*
/usr/local/cuda/lib64/libcudnn.so /usr/local/cuda/lib64/libcudnn.so.4.0.7
/usr/local/cuda/lib64/libcudnn.so.4 /usr/local/cuda/lib64/libcudnn_static.a

推荐答案

我认为您的 GPU GTX 860M 是 sm_50 设备.默认的 TensorFlow 二进制文件默认支持 sm_35 和 sm_52.这意味着您的二进制文件只有 PTX,并且 Cuda 运行时必须在第一次运行该内核时将它们 JIT 到 SASS 中,这需要一分钟左右的时间.但是它们应该在以后的运行中被缓存,除非缓存被明确禁用.

I think your GPU GTX 860M is a sm_50 device. The default TensorFlow binary supports sm_35 and sm_52 by default. That means your binary only has PTX, and the Cuda runtime has to JIT them into SASS on the first run of that kernel, and that takes one minute or so. But they should be cached in later runs, unless the caching was explicitly disabled.

这篇关于Tensorflow 首次在具有 5.0 计算能力的显卡上运行需要 1 分钟以上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆