如何在python中使用多重处理 [英] How to use multiprocessing in python

查看:174
本文介绍了如何在python中使用多重处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

python的新手,我想在以下代码中进行并行编程,并希望在python中使用多处理来做到这一点.那么如何修改代码呢?我一直在使用Pool搜索方法,但是发现了一些我可以遵循的示例.有人可以帮助我吗?谢谢.

New to python and I want to do parallel programming in the following code, and want to use multiprocessing in python to do it. So how to modify the code? I've been searching method by using Pool, but found limited examples that I can follow. Anyone can help me? Thank you.

请注意,setinner和setouter是两个独立的功能,因此我想使用并行编程来减少运行时间.

Note that setinner and setouter are two independent functions and that's where I want to use parallel programming to reduce the running time.

def solve(Q,G,n):
    i = 0
    tol = 10**-4

    while i < 1000:

        inneropt,partition,x = setinner(Q,G,n)
        outeropt = setouter(Q,G,n)

        if (outeropt - inneropt)/(1 + abs(outeropt) + abs(inneropt)) < tol:
            break

        node1 = partition[0]
        node2 = partition[1]

        G = updateGraph(G,node1,node2)
        if i == 999:
            print "Maximum iteration reaches"
    print inneropt

推荐答案

很难并行化需要突变来自不同任务的相同共享数据的代码.因此,我将假设setinnersetouter是非变异函数;如果事实并非如此,事情将会变得更加复杂.

It's hard to parallelize code that needs to mutate the same shared data from different tasks. So, I'm going to assume that setinner and setouter are non-mutating functions; if that's not true, things will be more complicated.

第一步是确定要并行执行的操作.

The first step is to decide what you want to do in parallel.

一个明显的事情是同时执行setinnersetouter.它们彼此完全独立,并且始终都需要完成.所以,这就是我要做的.而不是这样做:

One obvious thing is to do the setinner and setouter at the same time. They're completely independent of each other, and always need to both get done. So, that's what I'll do. Instead of doing this:

inneropt,partition,x = setinner(Q,G,n)
outeropt = setouter(Q,G,n)

...我们想将这两个函数作为任务提交到池中,然后等待两者完成,然后获取两者的结果.

… we want to submit the two functions as tasks to the pool, then wait for both to be done, then get the results of both.

concurrent.futures模块(在Python 2.x中需要第三方反向移植)比multiprocessing模块(位于stdlib中)更容易执行等待两者完成"之类的操作. 2.6+),但在这种情况下,我们不需要花哨的东西;如果其中一个提前完成,我们将无事可做,直到另一个都完成.因此,让我们坚持使用 multiprocessing.apply_async :

The concurrent.futures module (which requires a third-party backport in Python 2.x) makes it easier to do things like "wait for both to be done" than the multiprocessing module (which is in the stdlib in 2.6+), but in this case, we don't need anything fancy; if one of them finishes early, we don't have anything to do until the other finishes anyway. So, let's stick with multiprocessing.apply_async:

pool = multiprocessing.Pool(2) # we never have more than 2 tasks to run
while i < 1000:
    # parallelly start both tasks
    inner_result = pool.apply_async(setinner, (Q, G, n))
    outer_result = pool.apply_async(setouter, (Q, G, n))

    # sequentially wait for both tasks to finish and get their results
    inneropt,partition,x = inner_result.get()
    outeropt = outer_result.get()

    # the rest of your loop is unchanged

您可能希望将池移到函数之外,以使其永久存在并可以被代码的其他部分使用.如果没有,您几乎肯定会在函数结束时关闭池. (multiprocessing的更高版本仅允许您在with语句中使用池,但是我认为这需要Python 3.2+,因此您必须明确地执行此操作.)

You may want to move the pool outside the function so it lives forever and can be used by other parts of your code. And if not, you almost certainly want to shut the pool down at the end of the function. (Later versions of multiprocessing let you just use the pool in a with statement, but I think that requires Python 3.2+, so you have to do it explicitly.)

如果您想同时做更多的工作怎么办?好吧,如果不重新构造循环,这里没有其他明显的事情要做.您必须先从setinnersetouter返回结果,然后才能执行updateGraph,在此没有其他问题.

What if you want to do more work in parallel? Well, there's nothing else obvious to do here without restructuring the loop. You can't do updateGraph until you get the results back from setinner and setouter, and nothing else is slow here.

但是,如果您可以重新组织事情,以使每个循环的setinner独立于之前的所有内容(算法可能会或可能不会发生–在不知道您在做什么的情况下,我猜不到),您可以将2000个任务提前放入队列中,然后仅根据需要获取结果即可循环.例如:

But if you could reorganize things so that each loop's setinner were independent of everything that came before (which may or may not be possible with your algorithm—without knowing what you're doing, I can't guess), you could push 2000 tasks onto the queue up front, then loop by just grabbing results as needed. For example:

pool = multiprocessing.Pool() # let it default to the number of cores
inner_results = []
outer_results = []
for _ in range(1000):
    inner_results.append(pool.apply_async(setinner, (Q,G,n,i))
    outer_results.append(pool.apply_async(setouter, (Q,G,n,i))
while i < 1000:
    inneropt,partition,x = inner_results.pop(0).get()
    outeropt = outer_results.pop(0).get()
    # result of your loop is the same as before

当然,您可以选这名鸽友.

Of course you can make this fancier.

例如,假设您很少需要几百次迭代,因此总是计算1000次是很浪费的.您可以在启动时仅推入第一个N,然后在循环中每次推入一个(或每N次推入N个),这样您做的N次浪费的迭代就不会超过—您无法获得理想的 在完美的并行性和最小的浪费之间进行权衡,但是通常可以很好地对其进行调整.

For example, let's say you rarely need more than a couple hundred iterations, so it's wasteful to always compute 1000 of them. You can just push the first N at startup, and push one more every time through the loop (or N more every N times) so you never do more than N wasted iterations—you can't get an ideal tradeoff between perfect parallelism and minimal waste, but you can usually tune it pretty nicely.

此外,如果任务实际上并不需要那么长时间,但是您有很多任务,则可能需要将它们分批处理.一种简单的方法是使用map变体之一而不是apply_async.这会使您的获取代码稍微复杂一点,但是却使排队和批处理代码变得微不足道(例如,在map为10的100个参数列表中,对每个map进行map只是两个简单的操作代码行).

Also, if the tasks don't actually take that long, but you have a lot of them, you may want to batch them up. One really easy way to do this is to use one of the map variants instead of apply_async; this can make your fetching code a tiny bit more complicated, but it makes the queuing and batching code completely trivial (e.g., to map each func over a list of 100 parameters with a chunksize of 10 is just two simple lines of code).

这篇关于如何在python中使用多重处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆