python的multiprocessing和current.futures有什么区别? [英] What's the difference between python's multiprocessing and concurrent.futures?

查看:618
本文介绍了python的multiprocessing和current.futures有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在python中实现多处理的一种简单方法是

A simple way of implementing multiprocessing in python is

from multiprocessing import Pool

def calculate(number):
    return number

if __name__ == '__main__':
    pool = Pool()
    result = pool.map(calculate, range(4))

基于期货的另一种实现方式是

An alternative implementation based on futures is

from concurrent.futures import ProcessPoolExecutor

def calculate(number):
    return number

with ProcessPoolExecutor() as executor:
    result = executor.map(calculate, range(4))

这两种选择基本上都具有相同的作用,但是一个显着的区别是我们不必使用通常的if __name__ == '__main__'子句来保护代码.是因为执行期货会解决这个问题,还是因为我们有其他原因?

Both alternatives do essentially the same thing, but one striking difference is that we don't have to guard the code with the usual if __name__ == '__main__' clause. Is this because the implementation of futures takes care of this or us there a different reason?

从更广泛的意义上讲,multiprocessingconcurrent.futures之间有什么区别?什么时候比另一个更受欢迎?

More broadly, what are the differences between multiprocessing and concurrent.futures? When is one preferred over the other?

我最初的假设是,保护if __name__ == '__main__'仅对于多处理是必需的,这是错误的.显然,在Windows上的两种实现都需要这种防护,而在Unix系统上则不需要.

My initial assumption that the guard if __name__ == '__main__' is only necessary for multiprocessing was wrong. Apparently, one needs this guard for both implementations on windows, while it is not necessary on unix systems.

推荐答案

实际上,您也应该将if __name__ == "__main__"防护罩与ProcessPoolExecutor一起使用:它使用multiprocessing.Process将其Pool填充在幕后,就像multiprocessing.Pool确实如此,因此适用于可腌制性的所有相同警告(尤其是在Windows上)等.

You actually should use the if __name__ == "__main__" guard with ProcessPoolExecutor, too: It's using multiprocessing.Process to populate its Pool under the covers, just like multiprocessing.Pool does, so all the same caveats regarding picklability (especially on Windows), etc. apply.

根据 Jesse的说法,我相信ProcessPoolExecutor最终将取代multiprocessing.Pool当被问及Python为什么同时具有这两种API时,Noller (Python的核心贡献者):

I believe that ProcessPoolExecutor is meant to eventually replace multiprocessing.Pool, according to this statement made by Jesse Noller (a Python core contributor), when asked why Python has both APIs:

Brian和我需要进行我们打算进行的合并 因为人们对API感到满意.我最终的目标是删除 除了基本的多重处理外,MP之外的其他任何处理/队列内容 并入并发.*,并为此支持线程后端.

Brian and I need to work on the consolidation we intend(ed) to occur as people got comfortable with the APIs. My eventual goal is to remove anything but the basic multiprocessing.Process/Queue stuff out of MP and into concurrent.* and support threading backends for it.

就目前而言,ProcessPoolExecutor正在使用更简单(且功能更有限)的API与multiprocessing.Pool完全相同.如果您可以摆脱使用ProcessPoolExecutor的麻烦,那就使用它,因为从长期来看,我认为它更有可能得到增强.

For now, ProcessPoolExecutor is doing the exact same thing as multiprocessing.Pool with a simpler (and more limited) API. If you can get away with using ProcessPoolExecutor, use that, because I think it's more likely to get enhancements in the long-term.

请注意,您可以将multiprocessing中的所有帮助程序与ProcessPoolExecutor一起使用,例如LockQueueManager等.使用multiprocessing.Pool的主要原因是是否需要initializer /initargs(尽管有一个打开错误可以将它们添加到ProcessPoolExecutor中),或maxtasksperchild .或者您正在运行Python 2.7或更早版本,并且不想安装(或要求您的用户安装)concurrent.futures的反向端口.

Note that you can use all the helpers from multiprocessing with ProcessPoolExecutor, like Lock, Queue, Manager, etc. The main reasons to use multiprocessing.Pool is if you need initializer/initargs (though there is an open bug to get those added to ProcessPoolExecutor), or maxtasksperchild. Or you're running Python 2.7 or earlier, and don't want to install (or require your users to install) the backport of concurrent.futures.

同样值得注意的是:根据此问题multiprocessing.Pool.map的表现优于ProcessPoolExecutor.map.请注意,每个工作项 的性能差异很小,因此,如果在非常大的可迭代项上使用map,您可能只会注意到较大的性能差异.性能差异的原因是multiprocessing.Pool将批处理传递的可迭代对象映射成块,然后将这些块传递给工作进程,这减少了父级和子级之间IPC的开销. ProcessPoolExecutor始终一次将一项从可迭代项传递到子项,这会导致IPC开销增加,从而导致大型可迭代项的性能大大降低.好消息是,此问题将在Python 3.5中修复,因为chunksize关键字参数已添加到ProcessPoolExecutor.map中,如果您知道要处理的是大型可迭代对象,则可使用该参数指定更大的块大小.有关更多信息,请参见此 bug .

Also worth noting: According to this question, multiprocessing.Pool.map outperforms ProcessPoolExecutor.map. Note that the performance difference is very small per work item, so you'll probably only notice a large performance difference if you're using map on a very large iterable. The reason for the performance difference is that multiprocessing.Pool will batch the iterable passed to map into chunks, and then pass the chunks to the worker processes, which reduces the overhead of IPC between the parent and children. ProcessPoolExecutor always passes one item from the iterable at a time to the children, which can lead to much slower performance with large iterables, due to the increased IPC overhead. The good news is this issue will be fixed in Python 3.5, as as chunksize keyword argument has been added to ProcessPoolExecutor.map, which can be used to specify a larger chunk size if you know you're dealing with large iterables. See this bug for more info.

这篇关于python的multiprocessing和current.futures有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆