cuDevicePrimaryCtxRetain()是否用于在多个进程之间具有持久性CUDA上下文对象? [英] Is cuDevicePrimaryCtxRetain() used for having persistent CUDA context objects between multiple processes?

查看:704
本文介绍了cuDevicePrimaryCtxRetain()是否用于在多个进程之间具有持久性CUDA上下文对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,仅使用驱动程序api,我在下面进行了单个进程的分析(cuCtxCreate),cuCtxCreate的开销几乎可与从GPU复制到/从GPU复制300MB数据:

Using only driver api, for example, I have a profiling with single process below(cuCtxCreate), cuCtxCreate overhead is nearly comparable to 300MB data copy to/from GPU:

在CUDA文档中此处,它表示(对于cuDevicePrimaryCtxRetain)保留设备上的主上下文,并在需要时创建它的**** >。这是从命令行重复调用同一进程的预期行为(例如,运行一个进程1000次以显式处理1000个不同的输入图像)?设备是否需要CU_COMPUTEMODE_EXCLUSIVE_PROCESS才能按预期工作(多次调用时重复使用相同的上下文)?

In CUDA documentation here, it says(for cuDevicePrimaryCtxRetain) Retains the primary context on the device, creating it **if necessary**. Is this an expected behavior for repeated calls to same process from command line(such as running a process 1000 times for explicitly processing 1000 different input images)? Does device need CU_COMPUTEMODE_EXCLUSIVE_PROCESS to work as intended(re-use same context when called multiple times)?

就目前而言,即使我多次调用该过程,上图也一样。即使不使用事件探查器,计时也会显示大约1秒的完成时间。

For now, upper image is same even if I call that process multiple times. Even without using profiler, timings show around 1second completion time.

编辑:根据文档,主要上下文是每个设备每个进程。这是否意味着使用多线程单个应用程序不会出现问题?

According the documentation, primary context is one per device per process. Does this mean there won't be a problem when using multiple threaded single application?

主要上下文的重用时间限制是多少?进程之间是否需要1秒钟的间隔,还是必须花几毫秒的时间才能使主上下文保持活动状态?

What is re-use time limit for primary context? Is 1 second between processes okay or does it have to be miliseconds to keep primary context alive?

我已经将ptx代码缓存到文件中了,所以看起来唯一的开销是像cuMemAlloc(),malloc()和 cuMemHostRegister()这样,从上次调用到同一进程的最新上下文重用将优化时序。

I'm already caching ptx codes into a file so the only remaining overhead looks like cuMemAlloc(), malloc() and cuMemHostRegister() so re-using latest context from last call to same process would optimize timings good.

Edit-2::文档说使用上下文完成后,调用方必须调用cuDevicePrimaryCtxRelease()。对于 cuDevicePrimaryCtxRetain 。请问呼叫者在这里吗?我可以只在第一个调用的进程中使用保留,而在数百个后续调用的进程中,对最后一个调用的进程使用发布吗?如果无法启动最后一个进程并且未调用 cuDevicePrimaryCtxRelease ,系统是否需要重置?

Edit-2: Documentation says The caller must call cuDevicePrimaryCtxRelease() when done using the context. for cuDevicePrimaryCtxRetain. Is caller here any process? Can I just use retain in first called process and use release on the last called process in a list of hundreds of sequentally called processes? Does system need a reset if last process couldn't be launched and cuDevicePrimaryCtxRelease not called?

编辑-3:

主要意图是这样做的吗?

Is primary context intended for this?

process-1: retain (creates)
process-2: retain (re-uses)
...
process-99: retain (re-uses)
process-100: 1 x retain and 100 x release (to decrease counter and unload at last)






查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆