尝试运行TensorFlow时CUDNN_STATUS_NOT_INITIALIZED [英] CUDNN_STATUS_NOT_INITIALIZED when trying to run TensorFlow

查看:622
本文介绍了尝试运行TensorFlow时CUDNN_STATUS_NOT_INITIALIZED的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在装有Cuda 9.0和CuDNN 7.0.5和香草Python 2.7的Ubuntu 16.04上安装了TensorFlow 1.7,尽管它们的CUDA和CuDNN示例都运行良好,并且TensorFlow看到了GPU(因此运行了一些TensorFlow示例),但这些使用CuDNN(与大多数CNN示例一样)的则不会.他们因以下参考消息而失败:

I have installed TensorFlow 1.7 on an Ubuntu 16.04 with Cuda 9.0 and CuDNN 7.0.5 and vanilla Python 2.7 and although they samples for both CUDA and CuDNN run fine, and TensorFlow sees the GPU (so some TensorFlow examples run), those that use CuDNN (like most CNN examples) do not. They fail with these Informational messages:

2018-04-10 16:14:17.013026: I tensorflow/stream_executor/plugin_registry.cc:243] Selecting default DNN plugin, cuDNN
25428 2018-04-10 16:14:17.013100: E tensorflow/stream_executor/cuda/cuda_dnn.cc:403] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
25429 2018-04-10 16:14:17.013119: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  384.130  Wed Mar 21 03:37:26 PDT 2018
25430 GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)
25431 """
25432 2018-04-10 16:14:17.013131: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:112] version string "384.130" made value 384.130.0
25433 2018-04-10 16:14:17.013135: E tensorflow/stream_executor/cuda/cuda_dnn.cc:411] possibly insufficient driver version: 384.130.0
25434 2018-04-10 16:14:17.013139: E tensorflow/stream_executor/cuda/cuda_dnn.cc:370] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
25435 2018-04-10 16:14:17.013143: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)

打开大量VLOG消息(请参阅下面的链接以了解如何执行此操作)没有产生任何其他相关消息.

Turning on a flood of VLOG messages (see my link below for how to do this) did not produce any additional relevant messages.

这里的关键信息可能是选择默认的DNN插件cuDNN",因为看代码我可能会认为它无法加载cuDNN库模块,但就我所知实际上是正常的(因此不是警告),问题可能出在其他方面.

The key message here might be "Selecting default DNN plugin, cuDNN", because looking at the code I might think that it can't load the cuDNN library modules, but for all I know it is actually normal (so not a warning) and the problem could be something else.

例如,"CUDNN_STATUS_NOT_INITIALIZED"消息似乎是在早期版本中由TF过于积极地分配内存(在TF GitHub问题列表中找到此消息)引起的,因此CuDNN无法初始化,但是我尝试了这些补救措施(包括重置GPU并重新启动),但它们没有帮助.

For example the "CUDNN_STATUS_NOT_INITIALIZED" message seems to have been caused in an earlier version by TF too aggressively allocating memory ahead of time (found this in the TF GitHub issues list) so CuDNN could not initialize, but I tried those remedies (including resetting the GPU and rebooting), but they did not help.

关于我接下来应该尝试什么的任何想法?

Any ideas as to what I should try next?

推荐答案

好,我发现这是由于我安装了错误版本的cuDNN引起的,所以我怀疑它实际上没有找到正确的共享库是是的.

Ok, I found this, it was caused by me having the wrong version of cuDNN installed, so my suspicion that it was not actually finding the correct shared library was true.

基本上,我安装了cuDNN v7.1.2 for Cuda 9.1而不是cuDNN v7.1.2 for Cuda 9.0,这似乎已导致它默默地失败-尽管此时我希望收到一条错误消息.请注意,我已经运行了详细的VLOG,(有关如何执行此操作的更多信息,请参见我在这篇文章上的回答

Basically I installed cuDNN v7.1.2 for Cuda 9.1 instead of cuDNN v7.1.2 for Cuda 9.0, which seems to have been causing it to silently fail - although I would have expected an error message at this point. Note that I had detailed VLOGs running, (see my answer on this post for more information on how to do that Turning on TF Logs):

当我安装cuDNN v7.1.2 for Cuda 9.0时,它确实找到了它并抱怨该版本不够新-实际上,真正的问题是它不够旧,但是至少我有一些实际数据可以使用

When I installed cuDNN v7.1.2 for Cuda 9.0 it did in fact find it and complain that that version was not new enough - when in fact the real problem was that it was not old enough, but at least I had some real data to work with.

最后,cuDNN v7.0.5 for Cuda 9.0是我所需要的,并且有效.

In the end cuDNN v7.0.5 for Cuda 9.0 was what I needed and that worked.

这篇关于尝试运行TensorFlow时CUDNN_STATUS_NOT_INITIALIZED的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆