我应该如何解释此CUDA错误? [英] How should I interpret this CUDA error?

查看:133
本文介绍了我应该如何解释此CUDA错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用pyCUDA自学CUDA.在本练习中,我想将1024个浮点数的简单数组发送到GPU,并将其存储在共享内存中.正如我在下面的参数中指定的那样,我仅在具有1024个线程的单个块上运行此内核.

I am teaching myself CUDA with pyCUDA. In this exercise, I want to send over a simply array of 1024 floats to the GPU and store it in shared memory. As I specify below in my arguments, I run this kernel on just a single block with 1024 threads.

import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import pycuda.autoinit
import numpy as np
import matplotlib.pyplot as plt

arrayOfFloats = np.float64(np.random.sample(1024))
mod = SourceModule("""
  __global__ void myVeryFirstKernel(float* arrayOfFloats) {
    extern __shared__ float sharedData[];

    // Copy data to shared memory.
    sharedData[threadIdx.x] = arrayOfFloats[threadIdx.x];
  }
""")
func = mod.get_function('myVeryFirstKernel')
func(cuda.InOut(arrayOfFloats), block=(1024, 1, 1), grid=(1, 1))
print str(arrayOfFloats)

奇怪的是,我遇到了这个错误.

Strangely, I am getting this error.

[dfaux@harbinger CUDA_tutorials]$ python sharedMemoryExercise.py 
Traceback (most recent call last):
  File "sharedMemoryExercise.py", line 17, in <module>
    func(cuda.InOut(arrayOfFloats), block=(1024, 1, 1), grid=(1, 1))
  File "/software/linux/x86_64/epd-7.3-1-pycuda/lib/python2.7/site-packages/pycuda-2012.1-py2.7-linux-x86_64.egg/pycuda/driver.py", line 377, in function_call
    Context.synchronize()
pycuda._driver.LaunchError: cuCtxSynchronize failed: launch failed
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: launch failed
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: launch failed

我试图通过更改要发送到我的GPU的元素的类型来调试此错误(例如,我使用float32代替了float64).我也曾尝试更改块和网格大小,但无济于事.

I have tried to debug this error by changing the type of elements I am sending to my GPU (instead of float64, I use float32 for instance). I have also tried altering my block and grid sizes to no avail.

可能是什么问题?什么是死境?任何建议或想法表示赞赏.

What could be wrong? What is a dead context? Any advice or ideas appreciated.

推荐答案

我在您的代码中看到的一个问题是您使用 extern __shared__ .. ,这意味着您需要提交启动内核时共享内存.

One problem i see with your code is that you use extern __shared__ .. which means that you need to submit the size of the shared memory when you launch the kernel.

在pycuda中,这是通过以下方式完成的:
func(cuda.InOut(arrayOfFloats),block =(1024,1,1),grid =(1,1),shared = smem_size)
其中smem_size是共享内存的大小(以字节为单位).

In pycuda this is done by:
func(cuda.InOut(arrayOfFloats), block=(1024, 1, 1), grid=(1, 1),shared=smem_size)
where smem_size is the size of the shared memory in bytes.

在您的情况下,smem_size = 1024 * sizeof(float).

In your case smem_size = 1024*sizeof(float).

这篇关于我应该如何解释此CUDA错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆