这可能是因为 cuDNN 初始化失败,所以尝试查看上面是否打印了警告日志消息.[操作:Conv2D] [英] This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]

查看:44
本文介绍了这可能是因为 cuDNN 初始化失败,所以尝试查看上面是否打印了警告日志消息.[操作:Conv2D]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 anaconda 中安装了 TensorFlow-GPU 2.0,当我安装它并导入包然后运行我的 CNN 模型时,它工作正常,但是当我尝试运行训练模型时出现错误.

I install TensorFlow-GPU 2.0 in my anaconda when I install it and import the package then run my CNN model it's working correctly but when I try to run the training model the error appears.

这是我的错误报告:

Epoch 1/50
---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
<ipython-input-5-c4639d74909a> in <module>
      6                                         epochs=50,
      7                                         validation_data=testing_set,
----> 8                                         validation_steps=50)

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1295         shuffle=shuffle,
   1296         initial_epoch=initial_epoch,
-> 1297         steps_name='steps_per_epoch')
   1298 
   1299   def evaluate_generator(self,

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_generator.py in model_iteration(model, data, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch, mode, batch_size, steps_name, **kwargs)
    263 
    264       is_deferred = not model._is_compiled
--> 265       batch_outs = batch_function(*batch_data)
    266       if not isinstance(batch_outs, list):
    267         batch_outs = [batch_outs]

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py in train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics)
    971       outputs = training_v2_utils.train_on_batch(
    972           self, x, y=y, sample_weight=sample_weight,
--> 973           class_weight=class_weight, reset_metrics=reset_metrics)
    974       outputs = (outputs['total_loss'] + outputs['output_losses'] +
    975                  outputs['metrics'])

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py in train_on_batch(model, x, y, sample_weight, class_weight, reset_metrics)
    262       y,
    263       sample_weights=sample_weights,
--> 264       output_loss_metrics=model._output_loss_metrics)
    265 
    266   if reset_metrics:

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py in train_on_batch(model, inputs, targets, sample_weights, output_loss_metrics)
    309           sample_weights=sample_weights,
    310           training=True,
--> 311           output_loss_metrics=output_loss_metrics))
    312   if not isinstance(outs, list):
    313     outs = [outs]

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py in _process_single_batch(model, inputs, targets, output_loss_metrics, sample_weights, training)
    250               output_loss_metrics=output_loss_metrics,
    251               sample_weights=sample_weights,
--> 252               training=training))
    253       if total_loss is None:
    254         raise ValueError('The model cannot be run '

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py in _model_loss(model, inputs, targets, output_loss_metrics, sample_weights, training)
    125     inputs = nest.map_structure(ops.convert_to_tensor, inputs)
    126 
--> 127   outs = model(inputs, **kwargs)
    128   outs = nest.flatten(outs)
    129 

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py in __call__(self, inputs, *args, **kwargs)
    889           with base_layer_utils.autocast_context_manager(
    890               self._compute_dtype):
--> 891             outputs = self.call(cast_inputs, *args, **kwargs)
    892           self._handle_activity_regularization(inputs, outputs)
    893           self._set_mask_metadata(inputs, outputs, input_masks)

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\sequential.py in call(self, inputs, training, mask)
    254       if not self.built:
    255         self._init_graph_network(self.inputs, self.outputs, name=self.name)
--> 256       return super(Sequential, self).call(inputs, training=training, mask=mask)
    257 
    258     outputs = inputs  # handle the corner case where self.layers is empty

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\network.py in call(self, inputs, training, mask)
    706     return self._run_internal_graph(
    707         inputs, training=training, mask=mask,
--> 708         convert_kwargs_to_constants=base_layer_utils.call_context().saving)
    709 
    710   def compute_output_shape(self, input_shape):

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\network.py in _run_internal_graph(self, inputs, training, mask, convert_kwargs_to_constants)
    858 
    859           # Compute outputs.
--> 860           output_tensors = layer(computed_tensors, **kwargs)
    861 
    862           # Update tensor_dict.

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py in __call__(self, inputs, *args, **kwargs)
    889           with base_layer_utils.autocast_context_manager(
    890               self._compute_dtype):
--> 891             outputs = self.call(cast_inputs, *args, **kwargs)
    892           self._handle_activity_regularization(inputs, outputs)
    893           self._set_mask_metadata(inputs, outputs, input_masks)

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\keras\layers\convolutional.py in call(self, inputs)
    195 
    196   def call(self, inputs):
--> 197     outputs = self._convolution_op(inputs, self.kernel)
    198 
    199     if self.use_bias:

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in __call__(self, inp, filter)
   1132           call_from_convolution=False)
   1133     else:
-> 1134       return self.conv_op(inp, filter)
   1135     # copybara:strip_end
   1136     # copybara:insert return self.conv_op(inp, filter)

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in __call__(self, inp, filter)
    637 
    638   def __call__(self, inp, filter):  # pylint: disable=redefined-builtin
--> 639     return self.call(inp, filter)
    640 
    641 

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in __call__(self, inp, filter)
    236         padding=self.padding,
    237         data_format=self.data_format,
--> 238         name=self.name)
    239 
    240 

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\nn_ops.py in conv2d(input, filter, strides, padding, use_cudnn_on_gpu, data_format, dilations, name, filters)
   2008                            data_format=data_format,
   2009                            dilations=dilations,
-> 2010                            name=name)
   2011 
   2012 

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py in conv2d(input, filter, strides, padding, use_cudnn_on_gpu, explicit_paddings, data_format, dilations, name)
   1029             input, filter, strides=strides, use_cudnn_on_gpu=use_cudnn_on_gpu,
   1030             padding=padding, explicit_paddings=explicit_paddings,
-> 1031             data_format=data_format, dilations=dilations, name=name, ctx=_ctx)
   1032       except _core._SymbolicException:
   1033         pass  # Add nodes to the TensorFlow graph.

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py in conv2d_eager_fallback(input, filter, strides, padding, use_cudnn_on_gpu, explicit_paddings, data_format, dilations, name, ctx)
   1128   explicit_paddings, "data_format", data_format, "dilations", dilations)
   1129   _result = _execute.execute(b"Conv2D", 1, inputs=_inputs_flat, attrs=_attrs,
-> 1130                              ctx=_ctx, name=name)
   1131   _execute.record_gradient(
   1132       "Conv2D", _inputs_flat, _attrs, _result, name)

~\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow_core\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     65     else:
     66       message = e.message
---> 67     six.raise_from(core._status_to_exception(e.code, message), None)
     68   except TypeError as e:
     69     keras_symbolic_tensors = [

~\Anaconda3\envs\tf-gpu\lib\site-packages\six.py in raise_from(value, from_value)

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]

1
# Save a model
2
model.save('Datasets/300_train/CNN_300.tflearn')

这是我的 CNN 代码:

This is my CNN Code:

作为信息,我必须安装带有 NVCUDA.DLL 10.2.95 的 NVIDIA GPU 驱动程序版本 441.20

as the information, I have to install NVIDIA GPU Driver version 441.20 with NVCUDA.DLL 10.2.95

]1

推荐答案

我遇到了类似的问题.
已安装的机器:Python3.7、tensorflow-gpu 2.0、Cuda V10.0、cuDnn 7.4、Nvidia 驱动程序 411 版本、Windows 10(依赖项如 TF2 文档中所述).
在尝试重新配置和重新安装所有内容 3 天后,唯一有效的是:

I had a similar issue.
Installed machine: Python3.7, tensorflow-gpu 2.0, Cuda V10.0, cuDnn 7.4, Nvidia driver 411 version, Windows 10 (dependencies just as stated in TF2 documentation).
After 3 days of trying to reconfigure and reinstall everything, the only thing that worked was:

  1. 卸载 Cuda、cuDnn 和 tensorflow2
  2. 将 Nvidia 驱动程序更新为 441
  3. 安装 Cuda V10.0
  4. 安装 cuDnn 7.6(不是文档中所述的 7.4!)
  5. 安装 tensorflow-gpu2

请注意 tensoflow 重新编译自己很重要 - 这发生在第一次安装 nvidia 驱动程序和 tensorflow-gpu 以及从 python 代码调用任何 tensorflow 函数之后(它使代码挂起至少 2 分钟 - 在我的情况下是关于10分钟).重新安装 tensorflow\cuda 不会启动另一个 tensorflow 的重新编译过程,只有重新安装 nvidia 驱动程序才可以.

Notice it is important that tensoflow recompiles itself - this happens after first installation of nvidia driver and tensorflow-gpu and after calling any tensorflow function from python code (it makes the code hang for minimum 2 minutes - in my case it was about 10 minutes). Reinstalling tensorflow\cuda won't initiate another re-compile process of tensorflow, only reinstalling nvidia driver does.

这篇关于这可能是因为 cuDNN 初始化失败,所以尝试查看上面是否打印了警告日志消息.[操作:Conv2D]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆