使用 TensorFlow 和 GPU 加载与 CUDA 相关的库需要很长时间 [英] Using TensorFlow with GPU taking a long time for loading library related to CUDA

查看:76
本文介绍了使用 TensorFlow 和 GPU 加载与 CUDA 相关的库需要很长时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

机器设置:

  • GPU:GeForce RTX 3060

  • 驱动程序版本:460.73.01

  • CUDA 驱动程序版本:11.2

  • Tensorflow:tensorflow-gpu 1.14.0

  • CUDA 运行时版本:10.0

  • cudnn:7.4.1

注意:

  1. CUDA Runtime 和 cudnn 版本符合 Tensorflow 官方文档中的指南.
  2. 我也试过 TensorFlow-gpu = 2.0,还是一样的问题.

问题:

我将 Tensorflow 用于目标检测任务.我的情况是程序会卡在

2021-06-05 12:16:54.099778: 我tensorflow/stream_executor/platform/default/dso_loader.cc:42] 成功打开动态库libcublas.so.10

几分钟.

然后卡在下一个加载过程

2021-06-05 12:21:22.212818: 我tensorflow/stream_executor/platform/default/dso_loader.cc:42] 成功打开动态库libcudnn.so.7

甚至更长的时间.您可以查看

CUDA 支持矩阵

Machine Setting:

  • GPU: GeForce RTX 3060

  • Driver Version: 460.73.01

  • CUDA Driver Veresion: 11.2

  • Tensorflow: tensorflow-gpu 1.14.0

  • CUDA Runtime Version: 10.0

  • cudnn: 7.4.1

Note:

  1. CUDA Runtime and cudnn version fits the guide from Tensorflow official documentation.
  2. I've also tried for TensorFlow-gpu = 2.0, still the same problem.

Problem:

I am using Tensorflow for an objection detection task. My situation is that the program will stuck at

2021-06-05 12:16:54.099778: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10

for several minutes.

And then stuck at next loading process

2021-06-05 12:21:22.212818: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7

for even longer time. You may check log.txt for log details.

After waiting for around 30 mins, the program will start to running and WORK WELL.

However, whenever program invoke self.session.run(...), it will load the same two library related to cuda (libcublas and libcudnn) again, which is time-wasted and annoying.

I am confused that where the problem comes from and how to resolve it. Anyone could help?

Discussion Issue on Github

===================================

Update

After @talonmies 's help, the problem was resolved by resetting the environment with correct version matching among GPU, CUDA, cudnn and tensorflow. Now it works smoothly.

解决方案

Generally, if there are any incompatibility between TF, CUDA and cuDNN version you can observed this behavior.

For GeForce RTX 3060, support starts from CUDA 11.x. Once you upgrade to TF2.4 or TF2.5 your issue will be resolved.

For the benefit of community providing tested built configuration

CUDA Support Matrix

这篇关于使用 TensorFlow 和 GPU 加载与 CUDA 相关的库需要很长时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆