为什么我们需要 cudaDeviceSynchronize();在带有 device-printf 的内核中? [英] why do we need cudaDeviceSynchronize(); in kernels with device-printf?

查看:31
本文介绍了为什么我们需要 cudaDeviceSynchronize();在带有 device-printf 的内核中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

__global__ void helloCUDA(float f)
{
    printf("Hello thread %d, f=%f
", threadIdx.x, f);
}

int main()
{
    helloCUDA<<<1, 5>>>(1.2345f);
    cudaDeviceSynchronize();
    return 0;
}

为什么是 cudaDeviceSynchronize();在许多地方,例如 这里 它不是内核调用后需要?

Why is cudaDeviceSynchronize(); at many places for example here it is not required after kernel call?

推荐答案

内核启动是异步.这意味着它在启动 GPU 进程后,在内核完成执行之前立即将控制权返回给 CPU 线程.

A kernel launch is asynchronous. This means it returns control to the CPU thread immediately after starting up the GPU process, before the kernel has finished executing.

那么这里的 CPU 线程的下一步是什么?应用程序退出.

So what is the next thing in the CPU thread here? Application exit.

在应用程序退出时,将输出发送到标准输出的能力被操作系统终止.

At application exit, it's ability to send output to the standard output is terminated by the OS.

因此内核后面生成的输出无处可去,你也看不到.

Thus the output that is generated later by the kernel has nowhere to go, and you won't see it.

另一方面,如果你使用cudaDeviceSynchronize(),那么保证内核完成(并且内核的输出会找到一个等待的标准输出队列),before 允许应用程序退出.

On the other hand, if you use cudaDeviceSynchronize(), then the kernel is guaranteed to finish (and the output from the kernel will find a waiting standard output queue), before the application is allowed to exit.

这篇关于为什么我们需要 cudaDeviceSynchronize();在带有 device-printf 的内核中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆