Tf 2:无法创建cudnn句柄:CUDNN_STATUS_INTERNAL_ERROR [英] Tf 2: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

查看:98
本文介绍了Tf 2:无法创建cudnn句柄:CUDNN_STATUS_INTERNAL_ERROR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我执行以下代码时,出现上述错误(无法创建cudnn句柄:CUDNN_STATUS_INTERNAL_ERROR).我已经检查过我的GPU是否正在使用tf.test.is_gpu_available

I am getting the above error (Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR) when I execute the code below. I have cheked if my gpu is woking using tf.test.is_gpu_available

# coding: utf-8

import tensorflow as tf
import numpy as np
import keras
from models import *
import os 
import gc 

TF_FORCE_GPU_ALLOW_GROWTH = True

np.random.seed(1000)
#Paths
MODEL_CONF = "../models/conf/"
MODEL_WEIGHTS = "../models/weights/"
#Model informations
N_CLASSES = 3


def load_array(name):
    return np.load(name, allow_pickle = True)


gc.collect()

dirData = "saved_data/"
trainDir = dirData + "train/"

model = AdaptedLeNet((168, 168, 8), N_CLASSES)
model.summary(print_fn=lambda x: print(x + '\n'))

# Compile the model with the specified loss function.
model.compile(optimizer=keras.optimizers.Adam(),
            loss='categorical_crossentropy',
            metrics=['accuracy'])

for filename in os.listdir(trainDir):
    data = load_array(trainDir + filename)

    train = data["a"]
    labels = data["b"].astype(int).reshape(-1) 
    one_hot_targets = np.eye(N_CLASSES)[labels]

    model.fit(x=train, y=one_hot_targets, batch_size=32, epochs=5)

    gc.collect()

此代码的输出是:

Epoch 1/5
2020-04-03 18:50:43.397010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-03 18:50:43.608330: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-03 18:50:44.274270: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-03 18:50:44.275686: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-03 18:50:44.275747: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node conv2d_1/convolution}}]]
Traceback (most recent call last):
  File "cnnAlert.py", line 62, in <module>
    model.fit(x=train, y=one_hot_targets, batch_size=32, epochs=5)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit
    validation_freq=validation_freq)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
    outs = fit_function(ins_batch)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3727, in __call__
    outputs = self._graph_fn(*converted_inputs)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1551, in __call__
    return self._call_impl(args, kwargs)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1591, in _call_impl
    return self._call_flat(args, self.captured_inputs, cancellation_manager)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 545, in call
    ctx=ctx)
  File "/home/geodatin/env/py3GEE/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[node conv2d_1/convolution (defined at /home/geodatin/env/py3GEE/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3009) ]] [Op:__inference_keras_scratch_graph_2350]

Function call stack:
keras_scratch_graph

更多信息:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1660    Off  | 00000000:01:00.0  On |                  N/A |
| 27%   41C    P8     9W / 120W |    211MiB /  5911MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       989      G   /usr/lib/xorg/Xorg                            78MiB |
|    0      1438      G   cinnamon                                      31MiB |
|    0      8622      G   ...uest-channel-token=16736224539216711033    99MiB |
+-----------------------------------------------------------------------------+
3

如何解决此错误?你能帮助我吗?

How do I solve this error? Can you help me?

编辑1

  • 来自cudnn.h的CUDNN_VERSION:7605(7.6.5)
  • 主机编译器版本:GCC 7.5.0
  • Tensorflow:2.1.0-rc0;
  • CUDNN库在我的LD_LIBRARY_PATH中

推荐答案

您可能需要将tensorflow会话config.gpu_option.allow_growth设置为true,这可以通过在代码顶部添加以下内容来实现:

You might need to set the tensorflow session config.gpu_option.allow_growth to true, which can be done by adding the following to the top of your code:

gpu_options = tf.GPUOptions(allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
keras.backend.tensorflow_backend.set_session(sess)

这篇关于Tf 2:无法创建cudnn句柄:CUDNN_STATUS_INTERNAL_ERROR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆