为什么在RTX 3070/cudnn8/CUDA11.1上运行时添加卷积/池层会导致Keras/Tensorflow模型崩溃? [英] Why does adding convolution/pool layer crash Keras/Tensorflow model while running on RTX 3070/cudnn8/CUDA11.1?

查看:244
本文介绍了为什么在RTX 3070/cudnn8/CUDA11.1上运行时添加卷积/池层会导致Keras/Tensorflow模型崩溃?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

系统信息

  • 操作系统:Windows 10,
  • cudnn:8.0,
  • CUDA工具包:11.1已安装在10.2之上,
  • GPU:Nvidia RTX 3070,
  • CPU:Intel I7 10700f,
  • Tensorflow: tf .__ version __ == 2.4.0rc-0 (还尝试过 tf-nightly-gpu 直到2020年12月7日)
  • CUDA,cudnn从源代码手动编译
  • OS: Windows 10,
  • cudnn: 8.0,
  • CUDA toolkit: 11.1 installed overtop of 10.2,
  • GPU: Nvidia RTX 3070,
  • CPU: Intel I7 10700f,
  • Tensorflow: tf.__version__==2.4.0rc-0 (have also tried with tf-nightly-gpu as late as Dec 7, 2020)
  • CUDA, cudnn compiled manually from source

测试代码

下面的代码成功编译了模型,但是在调用 model.fit(...)时崩溃.

The below code successfully compiles a model but crashes when model.fit(...) is called.


from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

train_images, test_images = train_images / 255.0, test_images / 255.0

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))

model.compile(optimizer='Adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))

history = model.fit(train_images, train_labels, batch_size=10, epochs=100)

通过删除卷积 maxpooling 层,并在输入后平整张量,该模型就可以很好地训练(显然,该模型的输出是无用的,但实际上仍然能够训练).

By removing the convolutional and maxpooling layers and just flattening the tensors after input the model is able to train fine (obviously the output of this model is useless but it is still able to train).

程序崩溃时的错误代码是>进程已完成,退出代码为-1073740791(0xC0000409)

The error code when program crashes is >Process finished with exit code -1073740791 (0xC0000409)

另外,在调用 tf.config.list_physical_devices('GPU')

更新我在tensorflow github页面上打开了一个问题,您可以在此处

UPDATE I opened an issue on the tensorflow github page which you can find here

推荐答案

无论出于何种原因,在IDE终端中运行时,都会禁止显示错误消息,并且进程以退出代码-1073740791(0xC0000409)完成被记录为错误消息.

For whatever reason when run in the IDE terminal an error message was being suppressed and Process finished with exit code -1073740791 (0xC0000409) was logged as the error message.

从命令行运行时,显示以下错误消息,而不是记录退出代码错误.

When run from the command line the below error messages were displayed instead of logging the exit code error.

Could not load library cudnn_ops_infer64_8.dll. Error code 126
Please make sure cudnn_ops_infer64_8.dll is in your library path!

我意识到这是cudnn库中包含的软件包,并将其从cudnn中的bin文件夹复制并粘贴到NVIDIA GPU计算工具包>CUDA>V11.0>斌对于以下软件包,重复了此过程,此问题已解决.

I recognized this was a package included in the cudnn library and copy and pasted it from the bin folder in cudnn to NVIDIA GPU computing toolkit > CUDA > V11.0 > bin. This process was repeated for the below packages and the issue was resolved.

cudnn_adv_infer64_8.dll
cudnn_adv_train64_8.dll
cudnn_cnn_infer64_8.dll
cudnn_cnn_train64_8.dll
cudnn_ops_infer64_8.dll
cudnn_ops_train64_8.dll

这篇关于为什么在RTX 3070/cudnn8/CUDA11.1上运行时添加卷积/池层会导致Keras/Tensorflow模型崩溃?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆