一台设备的多个CUDA上下文-有什么意义? [英] Multiple CUDA contexts for one device - any sense?

查看:328
本文介绍了一台设备的多个CUDA上下文-有什么意义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以为我掌握了这一点,但显然我没有:)我需要使用NVENC从非编码器接受的任何格式的帧中执行并行H.264流编码。因此,我有以下代码管道:

I thought I had the grasp of this but apparently I do not:) I need to perform parallel H.264 stream encoding with NVENC from frames that are not in any of the formats accepted by the encoder so I have a following code pipeline:


  • 称为新帧到达的回调称为

  • 我复制帧到CUDA内存并执行所需的颜色空间转换(只有第一个 cuMemcpy 是同步的,因此我可以从回调中返回,所有待处理的操作都被推送到专用流中)

  • 我将一个事件推送到流上,并让另一个线程等待它,一旦设置好,我就将CUDA内存指针和帧放在正确的色彩空间中,并将其馈送到解码器

  • A callback informing that a new frame has arrived is called
  • I copy the frame to CUDA memory and perform the needed color space conversions (only the first cuMemcpy is synchronous, so I can return from the callback, all pending operations are pushed in a dedicated stream)
  • I push an event onto the stream and have another thread waiting for it, as soon as it is set I take the CUDA memory pointer with the frame in the correct color space and feed it to the decoder

出于某种原因,我假设如果我在并行线程中执行此管道,则每个线程都需要专用的上下文。代码很慢,经过一番阅读,我了解到上下文切换实际上是很昂贵的,然后我得出的结论是,这是没有意义的,因为在上下文中拥有整个GPU,所以我将其他代码转换器线程的任何并行处理都锁定了

For some reason I had the assumption that I need a dedicated context for each thread if I perform this pipeline in parallel threads. The code was slow and after some reading I understood that the context switching is actually expensive, and then I actually came to the conclusion that it makes no sense since in a context owns the whole GPU so I lock out any parallel processing from other transcoder threads.

问题1:在这种情况下,我可以使用单个上下文以及在此上下文中为每个执行线程执行的显式流

Question 1: In this scenario am I good with using a single context and an explicit stream created on this context for each thread that performs the mentioned pipeline?

问题2:有人能启发我CUDA设备上下文的唯一目的是什么?我认为这在多GPU场景中是有意义的,但是在任何情况下我都想为一个GPU创建多个上下文吗?

Question 2: Can someone enlighten me on what is the sole purpose of the CUDA device context? I assume it makes sense in a multiple GPU scenario, but are there any cases where I would want to create multiple contexts for one GPU?

推荐答案


问题1:在这种情况下,我是否愿意为执行提到的管道的每个线程使用单个上下文和在此上下文中创建的显式流?

Question 1: In this scenario am I good with using a single context and an explicit stream created on this context for each thread that performs the mentioned pipeline?

在单个上下文中应该没问题。

You should be fine with a single context.


问题2:有人可以使我了解CUDA设备上下文的唯一目的是什么?我认为在多GPU场景中这是有意义的,但是在任何情况下我都想为一个GPU创建多个上下文吗?

Question 2: Can someone enlighten me on what is the sole purpose of the CUDA device context? I assume it makes sense in a multiple GPU scenario, but are there any cases where I would want to create multiple contexts for one GPU?

编程指南中讨论了CUDA设备上下文。它表示与特定进程相关联的所有状态(内存映射,分配,内核定义以及其他与状态相关的信息)(即与该特定进程对GPU的使用相关联)。单独的进程通常会具有单独的上下文(就像单独的设备一样),因为这些进程具有独立的GPU使用率和独立的内存映射。

The CUDA device context is discussed in the programming guide. It represents all of the state (memory map, allocations, kernel definitions, and other state-related information) associated with a particular process (i.e. associated with that particular process' use of a GPU). Separate processes will normally have separate contexts (as will separate devices), as these processes have independent GPU usage and independent memory maps.

如果您对a的多进程使用GPU,通常会在该GPU上创建多个上下文。正如您所发现的,可以从一个进程中创建多个上下文,但通常不是必需的。

If you have multi-process usage of a GPU, you will normally create multiple contexts on that GPU. As you've discovered, it's possible to create multiple contexts from a single process, but not usually necessary.

是的,当您有多个上下文时,将在其中启动内核上下文将需要上下文切换,以从一个上下文中的一个内核转到另一个上下文中的另一个内核。这些内核不能同时运行。

And yes, when you have multiple contexts, kernels launched in those contexts will require context switching to go from one kernel in one context to another kernel in another context. Those kernels cannot run concurrently.

CUDA运行时API的使用为您管理上下文。使用运行时API时,您通常不会显式与CUDA上下文进行交互。但是,在使用驱动程序API时,将显式创建和管理上下文。

CUDA runtime API usage manages contexts for you. You normally don't explicitly interact with a CUDA context when using the runtime API. However, in driver API usage, the context is explicitly created and managed.

这篇关于一台设备的多个CUDA上下文-有什么意义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆