Python多处理与PyCUDA [英] Python Multiprocessing with PyCUDA

查看:452
本文介绍了Python多处理与PyCUDA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,我想分裂到多个CUDA设备,但我怀疑我目前的系统架构让我回来了;

I've got a problem that I want to split across multiple CUDA devices, but I suspect my current system architecture is holding me back;

我设置的是一个GPU类,具有对GPU执行操作的功能(奇怪)。这些操作是

What I've set up is a GPU class, with functions that perform operations on the GPU (strange that). These operations are of the style

for iteration in range(maxval):
    result[iteration]=gpuinstance.gpufunction(arguments,iteration)



我想到N个设备会有N个gpuinstances,不了解多处理的足够多,看到应用这种方法的最简单的方法,以便每个设备被异步分配,奇怪的是,我遇到的例子很少有具体的演示比较处理后的结果。

I'd imagined that there would be N gpuinstances for N devices, but I don't know enough about multiprocessing to see the simplest way of applying this so that each device is asynchronously assigned, and strangely few of the examples that I came across gave concrete demonstrations of collating results after processing.

任何人都可以在这个区域给我指点吗?

Can anyone give me any pointers in this area?

UPDATE
谢谢Kaloyan在多处理区域;如果CUDA不是具体的坚持我会标记你的答案。抱歉。

UPDATE Thank you Kaloyan for your guidance in terms of the multiprocessing area; if CUDA wasn't specifically the sticking point I'd be marking you as answered. Sorry.

gpuinstance类使用这个实现,使用 import pycuda.autoinit 启动CUDA设备。看起来工作,在每个(正确范围的)线程遇到cuda命令时立即抛出无效的上下文错误。然后我在 __ init __ 的类的构造函数中尝试手动初始化...

Perviously to playing with this implementation, the gpuinstance class initiated the CUDA device with import pycuda.autoinit But that didn't appear to work, throwing invalid context errors as soon as each (correctly scoped) thread met a cuda command. I then tried manual initialisation in the __init__ constructor of the class with...

pycuda.driver.init()
self.mydev=pycuda.driver.Device(devid) #this is passed at instantiation of class
self.ctx=self.mydev.make_context()
self.ctx.push()    

我的假设是,上下文保存在列表的gpuinstances被创建,当线程使用它们,所以每个设备都漂亮地坐在自己的上下文。

My assumption here is that the context is preserved between the list of gpuinstances is created and when the threads use them, so each device is sitting pretty in its own context.

(我还实现了一个析构函数来处理 pop / detach 清理)

(I also implemented a destructor to take care of pop/detach cleanup)

问题是,无效上下文异常仍然出现在线程尝试触摸CUDA。

Problem is, invalid context exceptions are still appearing as soon as the thread tries to touch CUDA.

任何想法人?感谢得到这远。工作香蕉的人的自动upvotes他们的答案! :P

Any ideas folks? And Thanks to getting this far. Automatic upvotes for people working 'banana' into their answer! :P

推荐答案

你需要将所有的香蕉排列在CUDA一侧,然后考虑最好的方法在Python中做到这一点[无耻的代价,我知道]。

You need to get all your bananas lined up on the CUDA side of things first, then think about the best way to get this done in Python [shameless rep whoring, I know].

CUDA多GPU模型在4.0之前非常简单 - 每个GPU都有自己的上下文,每个上下文必须由不同的主机线程建立。因此,伪代码中的想法是:

The CUDA multi-GPU model is pretty straightforward pre 4.0 - each GPU has its own context, and each context must be established by a different host thread. So the idea in pseudocode is:


  1. 应用程序启动时,进程使用API​​确定可用的GPUS数Linux)

  2. 应用程序每个GPU启动一个新的主机线程,传递一个GPU ID。每个线程隐式/显式调用相当于传递其已分配的GPU ID的cuCtxCreate()

  3. 利润!

在Python中,这可能看起来像这样:

In Python, this might look something like this:

import threading
from pycuda import driver

class gpuThread(threading.Thread):
    def __init__(self, gpuid):
        threading.Thread.__init__(self)
        self.ctx  = driver.Device(gpuid).make_context()
        self.device = self.ctx.get_device()

    def run(self):
        print "%s has device %s, api version %s"  \
             % (self.getName(), self.device.name(), self.ctx.get_api_version())
        # Profit!

    def join(self):
        self.ctx.detach()
        threading.Thread.join(self)

driver.init()
ngpus = driver.Device.count()
for i in range(ngpus):
    t = gpuThread(i)
    t.start()
    t.join()

这假定只要建立一个上下文而不预先检查设备是安全的。理想情况下,您将检查计算模式,以确保它是安全的尝试,然后使用异常处理程序,以防设备忙。但希望这能给出基本的想法。

This assumes it is safe to just establish a context without any checking of the device beforehand. Ideally you would check the compute mode to make sure it is safe to try, then use an exception handler in case a device is busy. But hopefully this gives the basic idea.

这篇关于Python多处理与PyCUDA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆