在一个 GPU 上运行多个 CUDA 应用程序 [英] Running more than one CUDA applications on one GPU

查看:28
本文介绍了在一个 GPU 上运行多个 CUDA 应用程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

CUDA 文档没有具体说明多少个 CUDA 进程可以共享一个 GPU.例如,如果我在系统中只安装了一张 GPU 卡的情况下,由同一用户启动多个 CUDA 程序,效果如何?能保证执行的正确性吗?在这种情况下,GPU 是如何调度任务的?

CUDA document does not specific how many CUDA process can share one GPU. For example, if I launch more than one CUDA programs by the same user with only one GPU card installed in the system, what is the effect? Will it guarantee the correctness of execution? How does the GPU schedule tasks in this case?

推荐答案

来自独立主机进程的 CUDA 活动通常会创建独立的 CUDA contexts,每个进程一个.因此,从不同主机进程启动的 CUDA 活动将在同一设备上的不同 CUDA 上下文中进行.

CUDA activity from independent host processes will normally create independent CUDA contexts, one for each process. Thus, the CUDA activity launched from separate host processes will take place in separate CUDA contexts, on the same device.

单独上下文中的 CUDA 活动将被序列化.GPU 将从一个进程执行活动,当该活动空闲时,它可以并将上下文切换到另一个上下文以完成从另一个进程启动的 CUDA 活动.未指定详细的上下文间调度行为.(在单个 GPU 上运行多个上下文通常也不会违反基本的 GPU 限制,例如设备分配的内存可用性.)请注意,上下文间切换/调度行为未指定,也可能因机器设置而异.偶然观察或微基准测试可能表明来自较新设备上的不同进程的内核可以同时运行(在 MPS 之外),但这是不正确的.较新的机器设置可能具有时间分片而非循环行为,但这并不会改变这样一个事实,即在任何给定时刻,只能运行来自一个上下文的代码.

CUDA activity in separate contexts will be serialized. The GPU will execute the activity from one process, and when that activity is idle, it can and will context-switch to another context to complete the CUDA activity launched from the other process. The detailed inter-context scheduling behavior is not specified. (Running multiple contexts on a single GPU also cannot normally violate basic GPU limits, such as memory availability for device allocations.) Note that the inter-context switching/scheduling behavior is unspecified and may also vary depending on machine setup. Casual observation or micro-benchmarking may suggest that kernels from separate processes on newer devices can run concurrently (outside of MPS) but this is not correct. Newer machine setups may have a time-sliced rather than round-robin behavior, but this does not change the fact that at any given instant in time, code from only one context can run.

这种情况(来自独立主机进程的 GPU 活动序列化)的例外"是 CUDA 多进程服务器.简而言之,MPS 充当收集 CUDA 活动的漏斗"来自多个主机进程,并像从单个主机进程发出一样运行该活动.主要好处是避免 序列化否则可能能够同时运行的内核.典型的用例是启动多个 MPI 等级,它们都打算使用单个 GPU 资源.

The "exception" to this case (serialization of GPU activity from independent host processes) would be the CUDA Multi-Process Server. In a nutshell, the MPS acts as a "funnel" to collect CUDA activity emanating from several host processes, and run that activity as if it emanated from a single host process. The principal benefit is to avoid the serialization of kernels which might otherwise be able to run concurrently. The canonical use-case would be for launching multiple MPI ranks that all intend to use a single GPU resource.

请注意,上述描述适用于默认"中的 GPU 计算模式.处于独占进程"或独占线程"计算模式的 GPU 将拒绝在单个设备上创建多个进程/上下文的任何尝试.在其中一种模式下,其他进程尝试使用已在使用的设备将导致 CUDA API 报告失败.在某些情况下,可以使用 nvidia 修改计算模式-smi 实用程序.

Note that the above description applies to GPUs which are in the "Default" compute mode. GPUs in "Exclusive Process" or "Exclusive Thread" compute modes will reject any attempts to create more than one process/context on a single device. In one of these modes, attempts by other processes to use a device already in use will result in a CUDA API reported failure. The compute mode is modifiable in some cases using the nvidia-smi utility.

这篇关于在一个 GPU 上运行多个 CUDA 应用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆