在一个GPU上运行多个CUDA应用程序 [英] Running more than one CUDA applications on one GPU

查看:438
本文介绍了在一个GPU上运行多个CUDA应用程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

CUDA文档没有具体说明可以共享一个GPU的CUDA进程数。例如,如果我由同一用户在系统中仅安装了一张GPU卡的情况下启动了多个CUDA程序,那会有什么效果?会保证执行的正确性吗?在这种情况下,GPU如何安排任务?

CUDA document does not specific how many CUDA process can share one GPU. For example, if I launch more than one CUDA programs by the same user with only one GPU card installed in the system, what is the effect? Will it guarantee the correctness of execution? How does the GPU schedule tasks in this case?

推荐答案

来自独立主机进程的CUDA活动通常会创建独立的CUDA 上下文,每个过程一个。因此,从单独的主机进程启动的CUDA活动将在同一设备上的单独CUDA上下文中进行。

CUDA activity from independent host processes will normally create independent CUDA contexts, one for each process. Thus, the CUDA activity launched from separate host processes will take place in separate CUDA contexts, on the same device.

在单独的上下文中的CUDA活动将被序列化。 GPU将从一个进程执行该活动,当该活动处于空闲状态时,它可以并且将上下文切换到另一个上下文以完成从另一个进程启动的CUDA活动。 未指定详细的上下文间调度行为。 (在单个GPU上运行多个上下文通常也不会违反基本GPU限制,例如设备分配的内存可用性。)请注意,上下文间的切换/调度行为未指定,并且可能会因计算机设置而异。随意观察或微基准测试可能表明,来自较新设备上独立进程的内核可以同时运行(在MPS之外),但这是不正确的。 较新的计算机设置可能具有时间限制,而不是轮询行为,但这不会改变这样的事实,即在任何给定的时间点,只能运行一个上下文中的代码。

CUDA activity in separate contexts will be serialized. The GPU will execute the activity from one process, and when that activity is idle, it can and will context-switch to another context to complete the CUDA activity launched from the other process. The detailed inter-context scheduling behavior is not specified. (Running multiple contexts on a single GPU also cannot normally violate basic GPU limits, such as memory availability for device allocations.) Note that the inter-context switching/scheduling behavior is unspecified and may also vary depending on machine setup. Casual observation or micro-benchmarking may suggest that kernels from separate processes on newer devices can run concurrently (outside of MPS) but this is not correct. Newer machine setups may have a time-sliced rather than round-robin behavior, but this does not change the fact that at any given instant in time, code from only one context can run.

这种情况的例外(来自独立主机进程的GPU活动序列化)是CUDA多进程服务器。简而言之, MPS 充当漏斗,收集发出的CUDA活动从多个主机进程中运行,并像从单个主机进程中发出的那样运行该活动。主要好处是避免序列化否则可以同时运行的内核。规范的用例是启动所有打算使用单个GPU资源的多个MPI等级。

The "exception" to this case (serialization of GPU activity from independent host processes) would be the CUDA Multi-Process Server. In a nutshell, the MPS acts as a "funnel" to collect CUDA activity emanating from several host processes, and run that activity as if it emanated from a single host process. The principal benefit is to avoid the serialization of kernels which might otherwise be able to run concurrently. The canonical use-case would be for launching multiple MPI ranks that all intend to use a single GPU resource.

请注意,以上说明适用于默认 计算模式。在独占进程或独占线程计算模式下的GPU将拒绝在单个设备上创建多个进程/上下文的任何尝试。在这些模式之一中,其他进程尝试使用已在使用的设备将导致CUDA API报告失败。在某些情况下,可以使用 nvidia修改计算模式-smi实用程序

Note that the above description applies to GPUs which are in the "Default" compute mode. GPUs in "Exclusive Process" or "Exclusive Thread" compute modes will reject any attempts to create more than one process/context on a single device. In one of these modes, attempts by other processes to use a device already in use will result in a CUDA API reported failure. The compute mode is modifiable in some cases using the nvidia-smi utility.

这篇关于在一个GPU上运行多个CUDA应用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆