使用cudaMemcpy时出现奇怪的错误:cudaErrorLaunchFailure [英] Strange error while using cudaMemcpy: cudaErrorLaunchFailure

查看:1548
本文介绍了使用cudaMemcpy时出现奇怪的错误:cudaErrorLaunchFailure的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个CUDA代码,其工作方式如下:

  cpyDataGPU-> CPU 

而(nsteps){

cudaKernel1<<<>
function1();
cudaKernel2<<<>

}

cpyDataGPU-> CPU






function1是这样的:

  function1 {

cudaKernel3<<<
cudaKernel4<<<>>

cpyNewNeedDataCPU-> GPU //错误行
cudaKernel5<<<>>>
}

根据 cudaMemcpy文档,此函数可以产生4种不同的错误代码: cudaSuccess, cudaCudaErrorInvalidValue和 DeviceErrorErrorInvalidValue,但是,我收到以下错误: cudaErrorLaunchFailure:执行内核时设备上发生了异常。常见原因包括取消引用无效的设备。指针并访问共享内存。除非调用cudaThreadExit(),否则无法使用该设备。所有现有设备内存分配均无效,并且如果程序要继续使用CUDA,则必须对其进行重新构造。



有人对我为什么收到此错误有任何想法吗?
我在做什么错?



在先前的内核调用之后复制数据CPU-> GPU是否有意义?问题是,我必须在每一步都将数据复制到这里,因为它可能会在每个 while步骤中更改。



需要提前进行大量思考!!

解决方案

您链接的文档还说:


请注意,此函数还可能返回先前异步启动的错误代码。


调用 cudaMemcpy时()程序将等待所有之前的GPU工作完成(请记住内核启动是异步的),然后检查状态并执行memcpy(如果一切正常)。但是,在这种情况下,您的一个内核已失败。



此错误的最常见原因是越界访问,就像x86中的段错误一样


cudaErrorLaunchFailure:执行内核时设备上发生了异常。常见原因包括取消引用无效的设备指针和访问共享内存超出范围。在调用cudaThreadExit()之前,无法使用该设备。所有现有的设备内存分配都是无效的,如果程序要继续使用CUDA,则必须对其进行重构。


最简单的调试方法是使用cuda-memcheck。另外,您可以通过在每次内核启动后调用 cudaDeviceSynchronize()并检查返回值来确定哪个内核失败。


I have a CUDA code which works like below:

cpyDataGPU --> CPU     

while(nsteps){

    cudaKernel1<<<,>>>
    function1();    
    cudaKernel2<<<,>>>

}

cpyDataGPU --> CPU


And function1 is like that:

function1{

    cudaKernel3<<<,>>>
    cudaKernel4<<<,>>>

    cpyNewNeedDataCPU --> GPU   // Error line
    cudaKernel5<<<,>>>
}

According to cudaMemcpy documentation, this function, can produce 4 differents error codes: "cudaSuccess", "cudaErrorInvalidValue", "cudaErrorInvalidDevicePointer" and "cudaErrorInvalidMemcpyDirection".

However, I get the following error: "cudaErrorLaunchFailure": "An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. The device cannot be used untilcudaThreadExit() is called. All existing device memory allocations are invalid and must be reconstructed if the program is to continue using CUDA."

Does anybody have any idea about why am I getting this error¿? What am I doing wrong¿?

Does it make sense, to copy data CPU-->GPU after previous kernel callings ¿? The problem is that, I have to copy that data here at each step because it may change in each "while" step.

Thaks a lot in advance!!

解决方案

The documentation you linked also says:

Note that this function may also return error codes from previous, asynchronous launches.

When you call cudaMemcpy() the program will wait for all preceding GPU work to complete (remember that kernel launches are asynchronous), then check the status and execute the memcpy if everything is ok. In this case, however, one of your kernels has failed.

The most common reason for this error is an out-of-bounds access, much like a segfault in x86 territory.

cudaErrorLaunchFailure : An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. The device cannot be used until cudaThreadExit() is called. All existing device memory allocations are invalid and must be reconstructed if the program is to continue using CUDA.

The easiest way to debug this would be to use cuda-memcheck. Alternatively you can identify which kernel failed by calling cudaDeviceSynchronize() after each kernel launch and checking the return value.

这篇关于使用cudaMemcpy时出现奇怪的错误:cudaErrorLaunchFailure的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆