PyCUDA / CUDA:非确定性发射失败的原因? [英] PyCUDA/CUDA: Causes of non-deterministic launch failures?

查看:236
本文介绍了PyCUDA / CUDA:非确定性发射失败的原因?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何关注CUDA的人都可能会看到我对我参与的项目的一些疑问,但对于那些没有我的人来说,我会总结一下。

Anyone following CUDA will probably have seen a few of my queries regarding a project I'm involved in, but for those who haven't I'll summarize. (Sorry for the long question in advance)

Three Kernels,One根据一些输入变量生成一个数据集(处理位组合,因此可以呈指数增长),另一个解决这些生成的线性系统,以及另一个减少内核以获得最终结果。

Three Kernels, One Generates a data set based on some input variables (deals with bit-combinations so can grow exponentially), another solves these generated linear systems, and another reduction kernel to get the final result out. These three kernels are ran over and over again as part of an optimisation algorithm for a particular system.

在我的dev机器(Geforce 9800GT,运行在CUDA 4.0下),这工作完全,无论什么时候,无论我抛在它(根据指定的性质的计算极限),但在测试机(4x特斯拉S1070,只有一个使用,CUDA 3.1)完全相同的代码(Python基础,PyCUDA接口到CUDA内核),为小情况产生确切的结果,但在中等范围的情况下,解决阶段在随机迭代失败。

On my dev machine (Geforce 9800GT, running under CUDA 4.0) this works perfectly, all the time, no matter what I throw at it (up to a computational limit based on the stated exponential nature), but on a test machine (4xTesla S1070's, only one used, under CUDA 3.1) the exact same code (Python base, PyCUDA interface to CUDA kernels), produces the exact results for 'small' cases, but in mid-range cases, the solving stage fails on random iterations.

以前的问题我已经使用这个代码是与数值不稳定的问题,并已在本质上是确定性的(即失败在完全相同的阶段每一次),但这一个坦率地让我失望,因为它会失败,当它想要。

Previous problems I've had with this code have been to do with the numeric instability of the problem, and have been deterministic in nature (i.e fails at exactly the same stage every time), but this one is frankly pissing me off, as it will fail whenever it wants to.

因此,我没有一个可靠的方法来打破CUDA代码从Python框架代码,并进行适当的调试,而PyCUDA的调试器支持是有问题的说至少。

As such, I don't have a reliable way to breaking the CUDA code out from the Python framework and doing proper debugging, and PyCUDA's debugger support is questionable to say the least.

我已经检查了像pre-kernel - 设备上的空闲内存的不活动检查,以及占用计算表明网格和块分配都很好。我不会做任何疯狂的4.0特定的东西,我释放了我在设备上分配的每一次迭代,我已经修复所有的数据类型为浮动。

I've checked the usual things like pre-kernel-invocation checking of free memory on the device, and occupation calculations say that the grid and block allocations are fine. I'm not doing any crazy 4.0 specific stuff, I'm freeing everything I allocate on the device at each iteration and I've fixed all the data types as being floats.

TL; DR ,有没有人遇到任何关于CUDA 3.1的问题,我在发行说明中没有看到,或任何问题与PyCUDA的autoinit内存管理环境,将导致间歇性发射失败重复

TL;DR, Has anyone come across any gotchas regarding CUDA 3.1 that I haven't seen in the release notes, or any issues with PyCUDA's autoinit memory management environment that would cause intermittent launch failures on repeated invocations?

推荐答案

您尝试过:

cuda-memcheck python yourapp.py

您可能有一个超出内存访问。

You likely have an out of bounds memory access.

这篇关于PyCUDA / CUDA:非确定性发射失败的原因?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆