cudaDeviceSynchronize在启动后返回错误代码4 [英] cudaDeviceSynchronize returned error code 4 after launching
问题描述
我使用CUDA编写了一个简单的矩阵乘法代码,当我为 A(10000 * 10000)* B(10000 * 10000)
接收此消息:
I have written a simple matrix multiplication code using CUDA, when I run code for input size of A(10000*10000)*B(10000*10000)
, I receive this message:
cudaDeviceSynchronize returned error code 4 after launching
在添加这些说明以测量运行时间后,我收到未指定的启动失败错误。
After adding these instructions in order to measure run time, I recieve "unspecified launch failure" error.
cudaEventRecord(start);
// here is my kernel call
cudaEventRecord(stop);
cudaEventSynchronize(stop);
这是我的内核调用:
mulKernel<<<1, dataSet.threadSize>>>(dev_c, dev_a, dev_b, dataSet.n, dataSet.m, dataSet.p, dataSet.threadSize);
这是我的内核代码:
int i = threadIdx.x;
int j, k, sum;
//if(n<=threadSize)
for(; i < n; i+=threadSize){
for(j = 0; j < p; j++){
sum = 0;
for(k = 0; k < m; k++){
sum += A[i * m + k] * B[k * p + j];
}
C[i *p + j] = sum;
}
}
如何解决此错误?
How can I fix this error?
推荐答案
您正在启动大小为 dataSet.threadSize
的1个块。这将是方式超过一个块中的最大线程数(对于Kepler GPU,我认为是1024)。阅读更多这里了解如何选择您的网格和块尺寸。
You are launching 1 block with size dataSet.threadSize
. This would be way more than the maximum number of threads in a block (1024 for Kepler GPU I think). Read more here on how to choose your grid and block dimensions.
这篇关于cudaDeviceSynchronize在启动后返回错误代码4的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!