python的multiprocessing和current.futures有什么区别? [英] What's the difference between python's multiprocessing and concurrent.futures?
问题描述
在python中实现多处理的一种简单方法是
A simple way of implementing multiprocessing in python is
from multiprocessing import Pool
def calculate(number):
return number
if __name__ == '__main__':
pool = Pool()
result = pool.map(calculate, range(4))
基于期货的另一种实现方式是
An alternative implementation based on futures is
from concurrent.futures import ProcessPoolExecutor
def calculate(number):
return number
with ProcessPoolExecutor() as executor:
result = executor.map(calculate, range(4))
这两种选择基本上都具有相同的作用,但是一个显着的区别是我们不必使用通常的if __name__ == '__main__'
子句来保护代码.是因为执行期货会解决这个问题,还是因为我们有其他原因?
Both alternatives do essentially the same thing, but one striking difference is that we don't have to guard the code with the usual if __name__ == '__main__'
clause. Is this because the implementation of futures takes care of this or us there a different reason?
从更广泛的意义上讲,multiprocessing
和concurrent.futures
之间有什么区别?什么时候比另一个更受欢迎?
More broadly, what are the differences between multiprocessing
and concurrent.futures
? When is one preferred over the other?
我最初的假设是,保护if __name__ == '__main__'
仅对于多处理是必需的,这是错误的.显然,在Windows上的两种实现都需要这种防护,而在Unix系统上则不需要.
My initial assumption that the guard if __name__ == '__main__'
is only necessary for multiprocessing was wrong. Apparently, one needs this guard for both implementations on windows, while it is not necessary on unix systems.
推荐答案
实际上,您也应该将if __name__ == "__main__"
防护罩与ProcessPoolExecutor
一起使用:它使用multiprocessing.Process
将其Pool
填充在幕后,就像multiprocessing.Pool
确实如此,因此适用于可腌制性的所有相同警告(尤其是在Windows上)等.
You actually should use the if __name__ == "__main__"
guard with ProcessPoolExecutor
, too: It's using multiprocessing.Process
to populate its Pool
under the covers, just like multiprocessing.Pool
does, so all the same caveats regarding picklability (especially on Windows), etc. apply.
根据 Jesse的说法,我相信ProcessPoolExecutor
最终将取代multiprocessing.Pool
当被问及Python为什么同时具有这两种API时,Noller (Python的核心贡献者):
I believe that ProcessPoolExecutor
is meant to eventually replace multiprocessing.Pool
, according to this statement made by Jesse Noller (a Python core contributor), when asked why Python has both APIs:
Brian和我需要进行我们打算进行的合并 因为人们对API感到满意.我最终的目标是删除 除了基本的多重处理外,MP之外的其他任何处理/队列内容 并入并发.*,并为此支持线程后端.
Brian and I need to work on the consolidation we intend(ed) to occur as people got comfortable with the APIs. My eventual goal is to remove anything but the basic multiprocessing.Process/Queue stuff out of MP and into concurrent.* and support threading backends for it.
就目前而言,ProcessPoolExecutor
正在使用更简单(且功能更有限)的API与multiprocessing.Pool
完全相同.如果您可以摆脱使用ProcessPoolExecutor
的麻烦,那就使用它,因为从长期来看,我认为它更有可能得到增强.
For now, ProcessPoolExecutor
is doing the exact same thing as multiprocessing.Pool
with a simpler (and more limited) API. If you can get away with using ProcessPoolExecutor
, use that, because I think it's more likely to get enhancements in the long-term.
请注意,您可以将multiprocessing
中的所有帮助程序与ProcessPoolExecutor
一起使用,例如Lock
,Queue
,Manager
等.使用multiprocessing.Pool
的主要原因是是否需要initializer
/initargs
(尽管有一个打开错误可以将它们添加到ProcessPoolExecutor中),或maxtasksperchild
.或者您正在运行Python 2.7或更早版本,并且不想安装(或要求您的用户安装)concurrent.futures
的反向端口.
Note that you can use all the helpers from multiprocessing
with ProcessPoolExecutor
, like Lock
, Queue
, Manager
, etc. The main reasons to use multiprocessing.Pool
is if you need initializer
/initargs
(though there is an open bug to get those added to ProcessPoolExecutor), or maxtasksperchild
. Or you're running Python 2.7 or earlier, and don't want to install (or require your users to install) the backport of concurrent.futures
.
同样值得注意的是:根据此问题,multiprocessing.Pool.map
的表现优于ProcessPoolExecutor.map
.请注意,每个工作项 的性能差异很小,因此,如果在非常大的可迭代项上使用map
,您可能只会注意到较大的性能差异.性能差异的原因是multiprocessing.Pool
将批处理传递的可迭代对象映射成块,然后将这些块传递给工作进程,这减少了父级和子级之间IPC的开销. ProcessPoolExecutor
始终一次将一项从可迭代项传递到子项,这会导致IPC开销增加,从而导致大型可迭代项的性能大大降低.好消息是,此问题将在Python 3.5中修复,因为chunksize
关键字参数已添加到ProcessPoolExecutor.map
中,如果您知道要处理的是大型可迭代对象,则可使用该参数指定更大的块大小.有关更多信息,请参见此 bug .
Also worth noting: According to this question, multiprocessing.Pool.map
outperforms ProcessPoolExecutor.map
. Note that the performance difference is very small per work item, so you'll probably only notice a large performance difference if you're using map
on a very large iterable. The reason for the performance difference is that multiprocessing.Pool
will batch the iterable passed to map into chunks, and then pass the chunks to the worker processes, which reduces the overhead of IPC between the parent and children. ProcessPoolExecutor
always passes one item from the iterable at a time to the children, which can lead to much slower performance with large iterables, due to the increased IPC overhead. The good news is this issue will be fixed in Python 3.5, as as chunksize
keyword argument has been added to ProcessPoolExecutor.map
, which can be used to specify a larger chunk size if you know you're dealing with large iterables. See this bug for more info.
这篇关于python的multiprocessing和current.futures有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!