cudaDeviceSynchronize在启动后返回错误代码4 [英] cudaDeviceSynchronize returned error code 4 after launching

查看:2520
本文介绍了cudaDeviceSynchronize在启动后返回错误代码4的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用CUDA编写了一个简单的矩阵乘法代码,当我为 A(10000 * 10000)* B(10000 * 10000)接收此消息:

I have written a simple matrix multiplication code using CUDA, when I run code for input size of A(10000*10000)*B(10000*10000), I receive this message:

cudaDeviceSynchronize returned error code 4 after launching

在添加这些说明以测量运行时间后,我收到未指定的启动失败错误。

After adding these instructions in order to measure run time, I recieve "unspecified launch failure" error.

cudaEventRecord(start);
// here is my kernel call
cudaEventRecord(stop);
cudaEventSynchronize(stop); 

这是我的内核调用:

mulKernel<<<1, dataSet.threadSize>>>(dev_c, dev_a, dev_b, dataSet.n, dataSet.m, dataSet.p, dataSet.threadSize);

这是我的内核代码:

    int i = threadIdx.x;
    int j, k, sum;
    //if(n<=threadSize)
    for(; i < n; i+=threadSize){
        for(j = 0; j < p; j++){
            sum = 0;
            for(k = 0; k < m; k++){
                sum += A[i * m + k] * B[k * p + j];
            }
            C[i *p + j] = sum;
        }
    }

如何解决此错误?

How can I fix this error?

推荐答案

您正在启动大小为 dataSet.threadSize 的1个块。这将是方式超过一个块中的最大线程数(对于Kepler GPU,我认为是1024)。阅读更多这里了解如何选择您的网格和块尺寸。

You are launching 1 block with size dataSet.threadSize. This would be way more than the maximum number of threads in a block (1024 for Kepler GPU I think). Read more here on how to choose your grid and block dimensions.

这篇关于cudaDeviceSynchronize在启动后返回错误代码4的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆