CPU-GPU 并行编程 (Python) [英] CPU-GPU Parallel programming (Python)
问题描述
有没有办法让我们同时在 CPU 和 GPU 上运行函数(使用 Python)?我已经在使用 Numba 对 GPU 上的计算密集型函数进行线程级调度,但我现在还需要在 CPU-GPU 之间添加并行性.一旦我们确保 GPU 共享内存具有开始处理的所有数据,我需要触发 GPU 启动,然后使用 CPU 在主机上并行运行一些功能.
Is there a way we could concurrently run functions on CPU and GPU (using Python)? I'm already using Numba to do thread level scheduling for compute intensive functions on the GPU, but I now also need to add parallelism between CPU-GPU. Once we ensure that the GPU shared memory has all the data to start processing, I need to trigger the GPU start and then in parallel run some functions on the host using the CPU.
我确信 GPU 返回数据所花费的时间远远超过 CPU 完成一项任务所花费的时间.这样一旦 GPU 完成处理,CPU 就已经在等待将数据提取到主机.是否有标准库/方法来实现这一目标?感谢您在这方面的任何指示.
I'm sure that the time taken by GPU to return the data is much more than the CPU to finish a task. So that once the GPU has finished processing, CPU is already waiting to fetch the data to the host. Is there a standard library/way to achieve this? Appreciate any pointers in this regard.
推荐答案
感谢 Robert 和 Ander.我在考虑类似的路线,但不是很确定.我检查了这一点,直到我在内核之间为任务完成进行了一些同步(例如,在使用 CuPy 时使用 cp.cuda.Device().synchronize())我有效地并行运行 GPU-CPU.再次感谢.使用 Numba 使 gpu_function 和 cpu_function 并行运行的一般流程如下所示:
Thanks Robert and Ander. I was thinking on similar lines but wasn't very sure. I checked that until I put some synchronization for task completion between the cores, (for ex. cp.cuda.Device().synchronize() when using CuPy) I'm effectively running GPU-CPU in parallel. Thanks again. A general flow with Numba, to make gpu_function and cpu_function run in parallel will be something like the following:
""" GPU has buffer full to start processing Frame N-1 """
tmp_gpu = cp.asarray(tmp_cpu)
gpu_function(tmp_gpu)
""" CPU receives Frame N over TCP socket """
tmp_cpu = cpu_function()
""" For instance we know cpu_function takes [a little] longer than gpu_function """
cp.cuda.Device().synchronize()
当然,我们甚至可以通过使用 PING-PONG 缓冲区和初始帧延迟来消除将 tmp_cpu 传输到 tmp_gpu 所花费的时间.
Of course, we could even do away with the time spent in transferring tmp_cpu to tmp_gpu by employing PING-PONG buffer and initial frame delay.
这篇关于CPU-GPU 并行编程 (Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!