OpenCL内核的执行时间过长导致崩溃 [英] Large execution time of OpenCL Kernel causes crash

查看:232
本文介绍了OpenCL内核的执行时间过长导致崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在建立一个光线行进器,以查看诸如mandelbox等之类的东西.它的效果很好.但是,在我当前的程序中,它使用每个工作人员作为从眼睛投射的光线.这意味着每个工作人员执行大量工作.因此,当查看一个异常复杂的对象或试图以足够大的精度进行渲染时,由于内核花费的时间太长,无法在单个工作线程上执行,这会导致我的显示驱动程序崩溃.我试图避免更改注册表值以使超时时间更长,因为我希望此应用程序可以在多台计算机上运行.

I'm currently building a ray marcher to look at things like the mandelbox, etc. It works great. However, with my current program, it uses each worker as a ray projected from the eye. This means that there is a large amount of execution per worker. So when looking at an incredibly complex object or trying to render with large enough precision it causes my display drivers to crash because the kernel was taking too long to execute on a single worker. I'm trying to avoid changing my registry values to make the timeout longer as I want this application to work on multiple computers.

有什么办法可以解决这个问题?就目前而言,每个工作项的执行完全独立于附近的工作项.我已经考虑向GPU订阅一个缓冲区,该缓冲区将在光线中存储当前进度,并且仅执行少量迭代.然后,我将一遍又一遍地调用该程序,其结果有望进一步完善.这样做的问题是我不确定如何处理分支射线(例如反射和折射),除非我有最大数目的分支可以期待.

Is there any way to remedy this? As it stands the executions of each work-item are completely independent of the work items nearby. I've contemplated subscribing a buffer to the GPU that would store the current progress on that ray and only execute a small amount of iterations. Then, I would just call the program over and over and the result would hopefully refine a bit more. The problem with this is that I am unsure how to deal with branching rays (eg. reflecting and refraction) unless I have a max number of each to anticipate.

有人对我应该采取什么措施来解决此问题有任何建议吗?我对OpenCL相当满意,并且已经有一段时间了.我觉得我主要是因为我的单个工作项背后有很多逻辑,所以我做错了什么或滥用OpenCL,但我不知道如何拆分任务,因为这只是一系列步骤以及检查和调整.

Anyone have any pointers on what I should do to remedy this problem? I'm quite the greenhorn to OpenCL and have been having this issue for quite some time. I feel as though I'm doing something wrong or misusing OpenCL principally since my single workitems have a lot of logic behind them, but I don't know how to split the task as it is just a series of steps and checks and adjustments.

推荐答案

您遇到的崩溃是由nVIDIA的HW看门狗计时器引起的.另外,操作系统还可能检测到GPU没有响应,然后重新启动它(至少Windows7会这样做).

The crash you are experiencing is caused by the HW watchdog timer of nVIDIA. Also, the OS may as well detect the GPU as not responsive and reboot it (at least Windows7 does it).

您可以通过多种方式避免出现这种情况:

You can avoid it by many ways:

  • 改进/优化内核代码以花费更少的时间
  • 购买速度更快的硬件($$$$)
  • 禁用看门狗计时器(但这不是一件容易的事,并且并非所有设备都具有该功能)
  • 通过启动多个小内核来减少每次排队等待设备的工作量(注意:这样做的开销很小,这是由于每个内核的启动引起的小内核)
  • Improve/optimize your kernel code to take less time
  • Buy faster Hardware ($$$$)
  • Disable the watchdog timer (but is not an easy task, and not all the devices have the feature)
  • Reduce the amount of work queued to the device each time, by launching multiple small kernels (NOTE: There is a small overhead of doing it this way, introduced by the launch of each small kernel)

最后一个是更简单直接的解决方案.但是,如果可以的话,也可以尝试第一个.

The easier and straightforward solution is the last one. But if you can, try the first one as well.

例如,像这样的调用(1000x1000 = 1M个工作项,全局大小):

As an example, a call like this (1000x1000 = 1M work items, Global size):

clEnqueueNDRangeKernel(queue, kernel, 2, NDRange(0,0)/*Offset*/, NDRange(1000,1000)/*Global*/, ... );

可以分解为((100x100)x(10x10)= 1M)的许多小调用.由于全局大小现在减小了100倍,因此不应触发看门狗:

Can be split up in many small calls of ((100x100)x(10x10) = 1M ). Since the global size is now 100 times smaller the watchdog should not be triggered:

for(int i=0; i<10; i++)
    for(int j=0; j<10; j++)
        clEnqueueNDRangeKernel(queue, kernel, 2, NDRange(i*100,j*100)/*Offset*/, NDRange(100,100)/*Global*/, ... );

这篇关于OpenCL内核的执行时间过长导致崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆