cuDevicePrimaryCtxRetain()是否用于在多个进程之间具有持久性CUDA上下文对象? [英] Is cuDevicePrimaryCtxRetain() used for having persistent CUDA context objects between multiple processes?
问题描述
例如,仅使用驱动程序api,我在下面进行了单个进程的分析(cuCtxCreate),cuCtxCreate的开销几乎可与从GPU复制到/从GPU复制300MB数据:
Using only driver api, for example, I have a profiling with single process below(cuCtxCreate), cuCtxCreate overhead is nearly comparable to 300MB data copy to/from GPU:
在CUDA文档中此处,它表示(对于cuDevicePrimaryCtxRetain)保留设备上的主上下文,并在需要时创建它的**** >。这是从命令行重复调用同一进程的预期行为(例如,运行一个进程1000次以显式处理1000个不同的输入图像)?设备是否需要CU_COMPUTEMODE_EXCLUSIVE_PROCESS才能按预期工作(多次调用时重复使用相同的上下文)?
In CUDA documentation here, it says(for cuDevicePrimaryCtxRetain) Retains the primary context on the device, creating it **if necessary**
. Is this an expected behavior for repeated calls to same process from command line(such as running a process 1000 times for explicitly processing 1000 different input images)? Does device need CU_COMPUTEMODE_EXCLUSIVE_PROCESS to work as intended(re-use same context when called multiple times)?
就目前而言,即使我多次调用该过程,上图也一样。即使不使用事件探查器,计时也会显示大约1秒的完成时间。
For now, upper image is same even if I call that process multiple times. Even without using profiler, timings show around 1second completion time.
编辑:根据文档,主要上下文是每个设备每个进程
。这是否意味着使用多线程单个应用程序不会出现问题?
According the documentation, primary context is one per device per process
. Does this mean there won't be a problem when using multiple threaded single application?
主要上下文的重用时间限制是多少?进程之间是否需要1秒钟的间隔,还是必须花几毫秒的时间才能使主上下文保持活动状态?
What is re-use time limit for primary context? Is 1 second between processes okay or does it have to be miliseconds to keep primary context alive?
我已经将ptx代码缓存到文件中了,所以看起来唯一的开销是像cuMemAlloc(),malloc()和 cuMemHostRegister()
这样,从上次调用到同一进程的最新上下文重用将优化时序。
I'm already caching ptx codes into a file so the only remaining overhead looks like cuMemAlloc(), malloc() and cuMemHostRegister()
so re-using latest context from last call to same process would optimize timings good.
Edit-2::文档说使用上下文完成后,调用方必须调用cuDevicePrimaryCtxRelease()。
对于 cuDevicePrimaryCtxRetain
。请问呼叫者在这里吗?我可以只在第一个调用的进程中使用保留,而在数百个后续调用的进程中,对最后一个调用的进程使用发布吗?如果无法启动最后一个进程并且未调用 cuDevicePrimaryCtxRelease
,系统是否需要重置?
Edit-2: Documentation says The caller must call cuDevicePrimaryCtxRelease() when done using the context.
for cuDevicePrimaryCtxRetain
. Is caller here any process? Can I just use retain in first called process and use release on the last called process in a list of hundreds of sequentally called processes? Does system need a reset if last process couldn't be launched and cuDevicePrimaryCtxRelease
not called?
编辑-3:
主要意图是这样做的吗?
Is primary context intended for this?
process-1: retain (creates)
process-2: retain (re-uses)
...
process-99: retain (re-uses)
process-100: 1 x retain and 100 x release (to decrease counter and unload at last)
- 所有内容均为sm_30编译,设备为Grid K520。
- GPU在cuCtxCreate()期间处于提升频率li>
- 项目是在具有Windows-7兼容性的Windows Server 2016 OS和CUDA驱动程序安装上编译的64位(发布模式)(这是K520 + Windows_server_2016的唯一工作方式)
推荐答案
cuDevicePrimaryCtxRetain()用于在多个进程之间使用持久CUDA上下文对象?
Is cuDevicePrimaryCtxRetain() used for having persistent CUDA context objects between multiple processes?
否。旨在允许驱动程序API绑定到一个上下文,使用运行时API的库已经延迟创建了该上下文。仅此而已。从前,有必要使用驱动程序API创建上下文,然后将运行时绑定到它们。现在,使用这些API,您不必这样做。例如,您可以在Tensorflow中查看如何完成此操作。
No. It is intended to allow the driver API to bind to a context which a library which has used the runtime API has already lazily created. Nothing more than that. Once upon a time it was necessary to create contexts with the driver API and then have the runtime bind to them. Now, with these APIs, you don't have to do that. You can, for example, see how this is done in Tensorflow here.
这是否意味着在使用多线程单个应用程序时不会出现问题?
Does this mean there won't be a problem when using multiple threaded single application?
自CUDA 2.0以来,驱动程序API一直是完全线程安全的
The driver API has been fully thread safe since about CUDA 2.0
呼叫者在这里吗?我可以只在第一个调用的进程中使用保留,而在数百个依次调用的进程中,对最后一个调用的进程使用发布吗?
Is caller here any process? Can I just use retain in first called process and use release on the last called process in a list of hundreds of sequentally [sic] called processes?
不。对于给定的过程,上下文始终是唯一的。不能以这种方式在进程之间共享它们。
No. Contexts are always unique to a given process. They can't be shared between processes in this way
主要上下文是为此目的吗?
Is primary context intended for this?
process-1: retain (creates)
process-2: retain (re-uses)
...
process-99: retain (re-uses)
process-100: 1 x retain and 100 x release (to decrease counter and unload at last)
否。
这篇关于cuDevicePrimaryCtxRetain()是否用于在多个进程之间具有持久性CUDA上下文对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!