GPU上的计算导致驱动器错误“停止响应” [英] Calculation on GPU leads to driver error "stopped responding"

查看:560
本文介绍了GPU上的计算导致驱动器错误“停止响应”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在MATLAB R2013b中执行的这个小废话脚本:

 清除所有; 

n = 2000;
times = 50;
i = 0;

tCPU = tic;

disp'CPU ::'
A = rand(n,n);
B = rand(n,n);
disp':: Go'
for i = 0:times
CPU = A * B;
end

tCPU = toc(tCPU);
tGPU = tic;

disp'GPU ::'
A = gpuArray(A);
B = gpuArray(B);
disp':: Go'
for i = 0:times
GPU = A * B;
end
tGPU = toc(tGPU);

fprintf('On CPU:%.2f sec \ nOn GPU:%.2f sec\\\
',tCPU,tGPU);不幸的是,在执行后,我收到一条消息从Windows说:显示驱动程序停止工作,并已经恢复。





这意味着Windows没有得到我的显卡驱动程序或东西的响应。脚本返回时没有错误:

 >> test 
CPU ::
:: Go
GPU ::
:: Go
在CPU上:11.01秒
在GPU上:2.97秒

但是无论GPU是否内存不足,MATLAB都不能在GPU之前使用GPU设备重新启动它。如果我不重新启动MATLAB,我只收到来自CUDA的消息:

 > test 
警告:执行CUDA
期间发生意外错误。 CUDA错误是:
CUDA_ERROR_LAUNCH_TIMEOUT
>在测试中1
警告:执行CUDA
期间发生意外错误。 CUDA错误是:
CUDA_ERROR_LAUNCH_TIMEOUT
>在测试中1
警告:执行CUDA
期间发生意外错误。 CUDA错误是:
CUDA_ERROR_LAUNCH_TIMEOUT
>在测试中1
警告:执行CUDA
期间发生意外错误。 CUDA错误是:
CUDA_ERROR_LAUNCH_TIMEOUT
>在测试中1
CPU ::
:: Go
GPU ::
使用gpuArray时出错
CUDA执行期间发生意外错误。
CUDA错误是:
启动超时并被终止

测试中的错误(第21行)
A = gpuArray(A);

有人知道如何避免这个问题或我在这里做错了吗?



如果需要,我的GPU设备:

 > gpuDevice 

ans =

CUDADevice具有属性:

名称:'GeForce GTX 660M'
索引:1
ComputeCapability :'3.0'
SupportsDouble:1
DriverVersion:6
Toolkit版本:5
MaxThreadsPerBlock:1024
MaxShmemPerBlock:49152
MaxThreadBlockSize:[1024 1024 64]
MaxGridSize:[2.1475e + 09 65535 65535]
SIMDWidth:32
TotalMemory:2.1475e + 09
FreeMemory:1.9037e + 09
MultiprocessorCount:2
ClockRateKHz:950000
ComputeMode:'Default'
GPUOverlapsTransfers:1
KernelExecutionTimeout:1
CanMapHostMemory:1
DeviceSupported:1
DeviceSelected:1


解决方案

关键的信息是< c $ c> gpuDevice 输出:

  KernelExecutionTimeout:1 

这意味着主机显示驱动程序在您正在运行计算作业的GPU上处于活动状态。 NVIDIA显示驱动器包含一个看门狗定时器,该定时器可以停止任何超过预定义时间量的任务,而不会将控制权交给驱动程序进行屏幕刷新。这旨在防止长时间运行或卡住的计算作业通过冻结显示器使机器无响应的情况。 Matlab脚本的运行时间明显超过显示驱动程序看门狗定时器限制。一旦发生这种情况,保存在设备上的计算上下文被破坏,并且Matlab不再能够与设备一起操作。您可以通过调用 重置 ,我想会运行 cudaDeviceReset()下面。



在interweb上有关此监视计时器的大量信息 - 例如此Stack Overflow问题。有关如何修改此超时的解决方案取决于您的操作系统和硬件。避免这种情况的最简单的方法是不在显示GPU上运行CUDA代码,或增加计算作业的粒度,以使没有一个操作的运行时间超过超时限制。或者只是写更快的代码...


I have this little nonsense script here which I am executing in MATLAB R2013b:

clear all;

n = 2000;
times = 50;
i = 0;

tCPU = tic;

disp 'CPU::'
A = rand(n, n);
B = rand(n, n);
disp '::Go'
for i = 0:times
    CPU = A * B;
end

tCPU = toc(tCPU);
tGPU = tic;

disp 'GPU::'
A = gpuArray(A);
B = gpuArray(B);
disp '::Go'
for i = 0:times
    GPU =  A * B ; 
end
tGPU = toc(tGPU);

fprintf('On CPU: %.2f sec\nOn GPU: %.2f sec\n', tCPU, tGPU);

Unfortunately after execution I receive a message from Windows saying: "Display driver stopped working and has recovered.".

Which I assume means that Windows did not get response from my graphic cards driver or something. The script returned without errors:

>> test
CPU::
::Go
GPU::
::Go
On CPU: 11.01 sec
On GPU: 2.97 sec

But no matter if the GPU runs out of memory or not, MATLAB is not able to use the GPU device before I restarted it. If I don't restart MATLAB I receive just a message from CUDA:

>> test
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT 
> In test at 1 
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT 
> In test at 1 
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT 
> In test at 1 
Warning: An unexpected error occurred during CUDA
execution. The CUDA error was:
CUDA_ERROR_LAUNCH_TIMEOUT 
> In test at 1 
CPU::
::Go
GPU::
Error using gpuArray
An unexpected error occurred during CUDA execution.
The CUDA error was:
the launch timed out and was terminated

Error in test (line 21)
A = gpuArray(A);

Does anybody know how to avoid this issue or what I am doing wrong here?

If needed, my GPU Device:

>> gpuDevice

ans = 

  CUDADevice with properties:

                      Name: 'GeForce GTX 660M'
                     Index: 1
         ComputeCapability: '3.0'
            SupportsDouble: 1
             DriverVersion: 6
            ToolkitVersion: 5
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 2.1475e+09
                FreeMemory: 1.9037e+09
       MultiprocessorCount: 2
              ClockRateKHz: 950000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1

解决方案

The key piece of information is this part of the gpuDevice output:

KernelExecutionTimeout: 1

This means that the host display driver is active on the GPU you are running the compute jobs on. The NVIDIA display driver contains a watchdog timer which kills any task which takes more than a predefined amount of time without yielding control back to the driver for screen refresh. This is intended to prevent the situation where a long running or stuck compute job renders the machine unresponsive by freezing the display. The runtime of your Matlab script is clearly exceeding the display driver watchdog timer limit. Once that happens, the the compute context held on the device is destroyed and Matlab can no longer operate with the device. You might be able to reinitialise the context by calling reset, which I guess will run cudaDeviceReset() under the cover.

There is a lot of information about this watchdog timer on the interweb - for example this Stack Overflow question. The solution for how to modify this timeout is dependent on your OS and hardware. The simplest way to avoid this is to not run CUDA code on a display GPU, or increase the granularity of your compute jobs so that no one operation has a runtime which exceeds the timeout limit. Or just write faster code...

这篇关于GPU上的计算导致驱动器错误“停止响应”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆