优雅地崩溃内核 [英] Crashing a kernel gracefully

查看:138
本文介绍了优雅地崩溃内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

后续: CUDA:停止所有其他主题

如果发生坏条件,我正在寻找一种退出内核的方法。
prog手册表示NVCC不支持异常处理。我想知道是否有一个用户定义的cud​​a错误代码。换句话说,如果坏发生,那么终止与此用户错误代码。我怀疑有一个,所以我的另一个想法是造成一个。

I'm looking for a way to exit a kernel if a "bad condition" occurs. The prog manual say NVCC does not support exception handling. I'm wondering if there is a user defined cuda-error-code. In other words if "bad" happens, then terminate with this user error code. I doubt there is one, so my other idea would be to cause one.

有点像,如果坏发生,除以零。但是我不确定一个线程是否被除以零,足以崩溃整个内核,或只是那个线程?

Something like, if "bad" happens, divide by zero. But I'm unsure if one thread does a divide-by-zero, is that enough to crash the whole kernel, or just that thread?

有更好的方法要终止内核?

Is there a better approach to terminating a kernel?

推荐答案

您应先阅读这个问题和harrism和tera的答案(昨天询问/回答)。

You should first read this question and the answers by harrism and tera (asked/answered yesterday).

您可能会使用

if (there_is_an_error) {
  *status = MY_ERROR_CODE; // store to device pointer
  __threadfence();         // ensure store issued before trap
  asm("trap;");            // kill kernel with error
}

这不完全符合你的graceful , 我的想法是。 Trap导致内核退出,运行时报告 cudaErrorUnknown 。但是由于内核执行是异步的,你将需要同步你的流/设备以捕获这个错误,这意味着在每次内核调用后同步,除非你可以有不精确的错误(即你可能不捕获错误代码,直到后

This does not exactly satisfy your condition of "graceful", in my opinion. Trap causes the kernel to exit and the runtime to report cudaErrorUnknown. But since kernel execution is asynchronous, you will need to synchronize your stream / device in order to catch this error, which means synchronizing after every kernel call, unless you are OK with having imprecise errors (i.e. you may not catch the error code until after calls to subsequent CUDA API calls).

但是这只是内核错误处理在CUDA中的方式,而良好编写的代码应该在调试版本中同步以检查内核错误,以及在发布版本中解决不准确的错误消息。不幸的是,我不认为有更优美的方式。

But this is just the way kernel error handling is in CUDA, and well-written codes should be synchronizing in debug builds to check kernel errors, and settling for imprecise error messages in release builds. Unfortunately, I don't think there is a more graceful way than that.

编辑:在Compute功能2.0和更高版本您可以使用断言()退出并在调试生成中出现错误。不清楚这是不是你想要的。

edit: on Compute capability 2.0 and later you can use assert() to exit with an error in debug builds. It was unclear if this is what you want though.

这篇关于优雅地崩溃内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆