如何诊断由于资源不足而导致的CUDA启动失败? [英] How do I diagnose a CUDA launch failure due to being out of resources?

查看:354
本文介绍了如何诊断由于资源不足而导致的CUDA启动失败?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当尝试启动一个CUDA内核(通过PyCUDA)时,我收到资源错误,我想知道是否可以让系统告诉我哪个资源是我的短上。显然系统知道什么资源已经耗尽,我只是想查询这个。

I'm getting an out-of-resources error when trying to launch a CUDA kernel (through PyCUDA), and I'm wondering if it's possible to get the system to tell me which resource it is that I'm short on. Obviously the system knows what resource has been exhausted, I just want to query that as well.

我使用了占用计算器,一切似乎还好,所以一个没有覆盖的角壳,或者我使用它错了。我知道它不是寄存器(这似乎是通常的罪魁祸首),因为我使用< = 63,它仍然失败与1x1x1块和1x1网格在CC 2.1设备。

I've used the occupancy calculator, and everything seems okay, so either there's a corner case not covered, or I'm using it wrong. I know it's not registers (which seems to be the usual culprit) because I'm using <= 63 and it still fails with a 1x1x1 block and 1x1 grid on a CC 2.1 device.

感谢任何帮助。我在NVidia板子上贴了一个帖子:

Thanks for any help. I posted a thread on the NVidia boards:

http://forums.nvidia.com/index.php?showtopic=206261&st=0

但没有反应。如果答案是你不能要求系统提供这些信息,那么很高兴知道(类似...;)。

But got no responses. If the answer is "you can't ask the system for that information" that would be nice to know too (sort of... ;).

编辑:

我看到的注册记录使用量最多已达到63.编辑以上内容以反映这一点。

The most register usage I've seen has been 63. Edited the above to reflect that.

推荐答案

我认为PyCUDA使用CUDA驱动程序API,所以以下可能是什么错误:如果你没有指定足够的参数,或者你指定错误的大小,可能会发生CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES作为参数,当使用 cuLaunch()启动内核。因为你使用PyCUDA,可能很容易使内核所需的参数列表和你实际传递的参数不匹配,所以你可能想检查你是如何调用你的内核。

I think PyCUDA uses the CUDA driver API, so the following may be what is wrong: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES can happen if you do not specify enough arguments, or you specify the wrong size for arguments, when using cuLaunch() to launch kernels. Since you are using PyCUDA, it could be pretty easy to mismatch the argument list required for a kernel and the arguments you are actually passing, so you might want to check how you are calling your kernels.

我认为这是一个很差的错误代码在这种情况下...

I think that this is a poorly named error code in this situation...

这篇关于如何诊断由于资源不足而导致的CUDA启动失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆