在嵌套循环中使用 multiprocessor.Pool 的正确方法 [英] Proper way to use multiprocessor.Pool in a nested loop

查看:24
本文介绍了在嵌套循环中使用 multiprocessor.Pool 的正确方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 multiprocessor.Pool() 模块来加速令人尴尬的并行"循环.我实际上有一个嵌套循环,并且正在使用 multiprocessor.Pool 来加速内部循环.例如,没有并行化循环,我的代码如下:

I am using the multiprocessor.Pool() module to speed up an "embarrassingly parallel" loop. I actually have a nested loop, and am using multiprocessor.Pool to speed up the inner loop. For example, without parallelizing the loop, my code would be as follows:

outer_array=[random_array1]
inner_array=[random_array2]
output=[empty_array]    

for i in outer_array:
    for j in inner_array:
        output[j][i]=full_func(j,i)

并行化:

import multiprocessing
from functools import partial

outer_array=[random_array1]
inner_array=[random_array2]
output=[empty_array]    

for i in outer_array:
    partial_func=partial(full_func,arg=i)     
    pool=multiprocessing.Pool() 
    output[:][i]=pool.map(partial_func,inner_array)
    pool.close()

我的主要问题是这是否正确,我应该在循环内包含 multiprocessing.Pool(),还是应该在循环外创建池,即:

My main question is if this is the correct, and I should be including the multiprocessing.Pool() inside the loop, or if instead I should create the pool outside loop, i.e.:

pool=multiprocessing.Pool() 
for i in outer_array:
     partial_func=partial(full_func,arg=i)     
     output[:][i]=pool.map(partial_func,inner_array)

此外,我不确定是否应该在上面第二个示例中的每个循环末尾包含pool.close()"行;这样做有什么好处?

Also, I am not sure if I should include the line "pool.close()" at the end of each loop in the second example above; what would be the benefits of doing so?

谢谢!

推荐答案

理想情况下,您应该只调用一次 Pool() 构造函数 - 而不是 &再次.创建工作进程时会产生大量开销,并且每次调用 Pool() 时都需要支付这些费用.由单个 Pool() 调用创建的进程仍然存在!当他们完成您在计划的某一部分中交给他们的工作时,他们会留下来等待更多工作要做.

Ideally, you should call the Pool() constructor exactly once - not over & over again. There are substantial overheads when creating worker processes, and you pay those costs every time you invoke Pool(). The processes created by a single Pool() call stay around! When they finish the work you've given to them in one part of the program, they stick around, waiting for more work to do.

对于 Pool.close(),您应该在 - 且仅当 - 您永远不会向 Pool 实例提交更多工作时调用它.所以 Pool.close() 通常在主程序的可并行部分完成时调用.然后,当所有已分配的工作完成后,工作进程将终止.

As to Pool.close(), you should call that when - and only when - you're never going to submit more work to the Pool instance. So Pool.close() is typically called when the parallelizable part of your main program is finished. Then the worker processes will terminate when all work already assigned has completed.

调用 Pool.join() 等待工作进程终止也是一种很好的做法.除其他原因外,在并行化代码中通常没有好的方法报告异常(异常发生在与主程序正在执行的操作仅模糊相关的上下文中),并且 Pool.join() 提供了一个同步点可以报​​告一些在工作进程中发生的异常,否则您将永远不会看到.

It's also excellent practice to call Pool.join() to wait for the worker processes to terminate. Among other reasons, there's often no good way to report exceptions in parallelized code (exceptions occur in a context only vaguely related to what your main program is doing), and Pool.join() provides a synchronization point that can report some exceptions that occurred in worker processes that you'd otherwise never see.

玩得开心:-)

这篇关于在嵌套循环中使用 multiprocessor.Pool 的正确方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆