使用CUDNN_STATUS_ALLOC_FAILED的Tensorflow崩溃 [英] Tensorflow crash with CUDNN_STATUS_ALLOC_FAILED

查看:445
本文介绍了使用CUDNN_STATUS_ALLOC_FAILED的Tensorflow崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于在网上搜索了好几个小时都没有结果,所以我想在这里问.

Been searching the web for hours with no results, so figured I'd ask here.

我正在按照Sentdex的教程制作自动驾驶汽车,但是在运行模型时,出现了一系列致命错误.我已经在互联网上搜索了解决方案,但许多人似乎都遇到了同样的问题.但是,我没有找到任何解决方案(包括此Stack-post ),为我工作.

I'm trying to make a self driving car following Sentdex's tutorial, but when running the model, I get a bunch of fatal errors. I've searched all over the internet for the solution, and many seem to have the same problem. However, none of the solutions I've found (Including this Stack-post), work for me.

这是我的软件:

  • Tensorflow:1.5,GPU版本
  • CUDA:9.0,带有修补程序
  • CUDnn:7
  • Windows 10专业版
  • Python 3.6

硬件:

  • Nvidia 1070ti,带有最新驱动程序
  • 英特尔i5 7600K

以下是崩溃日志:

2018-02-04 16:29:33.606903: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:444] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2018-02-04 16:29:33.608872: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:444] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2018-02-04 16:29:33.609308: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:444] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2018-02-04 16:29:35.145249: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED 2018-02-04 16:29:35.145563: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM 2018-02-04 16:29:35.149896: F C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)

2018-02-04 16:29:33.606903: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:444] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2018-02-04 16:29:33.608872: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:444] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2018-02-04 16:29:33.609308: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:444] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2018-02-04 16:29:35.145249: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED 2018-02-04 16:29:35.145563: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM 2018-02-04 16:29:35.149896: F C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)

这是我的代码:

 import tensorflow as tf
    import numpy as np
    import cv2
    import time
    from PIL import ImageGrab
    from getkeys import key_check
    from alexnet import alexnet
    import os
    from sendKeys import PressKey, ReleaseKey, W,A,S,D,Sp

    import random

    WIDTH = 80
    HEIGHT = 60
    LR = 1e-3
    EPOCHS = 10
    MODEL_NAME = 'DiRT-AI-Driver-{}-{}-{}-epochs.model'.format(LR, 'alexnetv2', EPOCHS)

    def straight():
        PressKey(W)
        ReleaseKey(A)
        ReleaseKey(S)
        ReleaseKey(D)
        ReleaseKey(Sp)
    def left():
        PressKey(A)
        ReleaseKey(W)
        ReleaseKey(S)
        ReleaseKey(D)
        ReleaseKey(Sp)
    def right():
        PressKey(D)
        ReleaseKey(A)
        ReleaseKey(S)
        ReleaseKey(W)
        ReleaseKey(Sp)
    def brake():
        PressKey(S)
        ReleaseKey(A)
        ReleaseKey(W)
        ReleaseKey(D)
        ReleaseKey(Sp)
    def handbrake():
        PressKey(Sp)
        ReleaseKey(A)
        ReleaseKey(S)
        ReleaseKey(D)
        ReleaseKey(W)

    model = alexnet(WIDTH, HEIGHT, LR)
    model.load(MODEL_NAME)


    def main():
        last_time = time.time()
        for i in list(range(4))[::-1]:
            print(i+1)
            time.sleep(1)


    paused = False
    while(True):
            if not paused:
                screen = np.array(ImageGrab.grab(bbox=(0,40,1024,768)))
                screen = cv2.cvtColor(screen,cv2.COLOR_BGR2GRAY)
                screen = cv2.resize(screen,(80,60))
                print('Loop took {} seconds'.format(time.time()-last_time))
                last_time = time.time()
                print('took time')
                prediction = model.predict([screen.reshape(WIDTH,HEIGHT,1)])[0]
                print('predicted')
                moves = list(np.around(prediction))
                print('got moves')
                print(moves,prediction)

                if moves == [1,0,0,0,0]:
                    straight()
                elif moves == [0,1,0,0,0]:
                    left()
                elif moves == [0,0,1,0,0]:
                    brake()
                elif moves == [0,0,0,1,0]:
                    right()
                elif moves == [0,0,0,0,1]:
                    handbrake()

            keys = key_check()

            if 'T' in keys:
                if paused:
                    pased = False
                    time.sleep(1)
                else:
                    paused = True
                    ReleaseKey(W)
                    ReleaseKey(A)
                    ReleaseKey(S)
                    ReleaseKey(D)
                    ReleaseKey(Sp)
                    time.sleep(1)


main()

我发现使python崩溃并产生前三个bug的行是以下行:

I've found that the line that crashes python and spawns the first three bugs is this line:

  • prediction = model.predict([screen.reshape(WIDTH,HEIGHT,1)])[0]

运行代码时,CPU的运行速度高达100%,这表明有严重问题. GPU的使用率约为40-50%

When running the code, the CPU goes up to a whopping 100%, suggesting that something is seriously off. GPU goes to about 40-50%

我尝试过Tensorflow 1.2和1.3以及CUDA 8,效果不佳.安装CUDA时,我不安装特定的驱动程序,因为它们对于我的GPU而言太旧了.也尝试过不同的CUDnn,效果不好.

I've tried Tensorflow 1.2 and 1.3, as well as CUDA 8, to no good. When installing CUDA I do not install the specific drivers, since they are too old for my GPU. Tried different CUDnn's too, did no good.

推荐答案

就我而言,发生此问题是因为正在运行另一个导入了tensorflow的python控制台.关闭它可以解决问题.

In my case, the issue happened because another python console with tensorflow imported was running. Closing it solved the problem.

我有Windows 10,主要错误是:

I have Windows 10, the main errors were :

无法创建cublas句柄:CUBLAS_STATUS_ALLOC_FAILED

failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED

无法创建cudnn句柄:CUDNN_STATUS_ALLOC_FAILED

Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED

这篇关于使用CUDNN_STATUS_ALLOC_FAILED的Tensorflow崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆