多个进程并行启动CUDA内核 [英] Multiple processes launching CUDA kernels in parallel

查看:1693
本文介绍了多个进程并行启动CUDA内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道具有2.x或更高的计算能力的NVIDIA gpus可以同时执行u pto 16内核。
然而,我的应用程序产生了7个进程,这7个进程中的每个进程都启动了CUDA内核。

I know that NVIDIA gpus with compute capability 2.x or greater can execute u pto 16 kernels concurrently. However, my application spawns 7 "processes" and each of these 7 processes launch CUDA kernels.

我的第一个问题是,这些内核。

My first question is that what would be the expected behavior of these kernels. Will they execute concurrently as well or, since they are launched by different processes, they would execute sequentially.

我很困惑,因为CUDA C编程指南说:

I am confused because the CUDA C programming guide says:

来自一个CUDA上下文的内核不能与来自另一个CUDA上下文的内核同时执行。
这带给我第二个问题,什么是CUDA上下文?

"A kernel from one CUDA context cannot execute concurrently with a kernel from another CUDA context." This brings me to my second question, what are CUDA "contexts"?

谢谢!

推荐答案

CUDA上下文是一个虚拟执行空间,它保存主机线程或进程拥有的代码和数据。只有一个上下文可以在所有当前硬件的GPU上处于活动状态。

A CUDA context is a virtual execution space that holds the code and data owned by a host thread or process. Only one context can ever be active on a GPU with all current hardware.

因此,要回答第一个问题,如果有七个单独的线程或进程,上下文并且在同一GPU上同时运行,则它们将被串行化,并且等待访问GPU的任何进程将被阻塞,直到运行上下文的所有者产生。据我所知,没有时间切片和调度启发式没有记录和(我怀疑)从操作系统到操作系统不一致。

So to answer your first question, if you have seven separate threads or processes all trying to establish a context and run on the same GPU simultaneously, they will be serialised and any process waiting for access to the GPU will be blocked until the owner of the running context yields. There is, to the best of my knowledge, no time slicing and the scheduling heuristics are not documented and (I would suspect) not uniform from operating system to operating system.

你最好启动一个工作线程持有GPU上下文,并使用其他线程的消息传递将工作推送到GPU上。或者,在CUDA驱动程序API中提供了一个上下文迁移工具,但只能使用来自同一进程的线程,并且迁移机制具有延迟和主机CPU开销。

You would be better to launch a single worker thread holding a GPU context and use messaging from the other threads to push work onto the GPU. Alternatively there is a context migration facility available in the CUDA driver API, but that will only work with threads from the same process, and the migration mechanism has latency and host CPU overhead.

这篇关于多个进程并行启动CUDA内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆