ProcessPoolExecutor从concurrent.futures方式慢于multiprocessing.Pool [英] ProcessPoolExecutor from concurrent.futures way slower than multiprocessing.Pool
问题描述
我正在尝试在Python 3.2中引入的新的闪亮 concurrent.futures 模块,我注意到,几乎使用相同的代码,使用来自concurrent.futures的池是方式慢于使用 multiprocessing.Pool 。
I was experimenting with the new shiny concurrent.futures module introduced in Python 3.2, and I've noticed that, almost with identical code, using the Pool from concurrent.futures is way slower than using multiprocessing.Pool.
这是使用多处理的版本:
This is the version using multiprocessing:
def hard_work(n):
# Real hard work here
pass
if __name__ == '__main__':
from multiprocessing import Pool, cpu_count
try:
workers = cpu_count()
except NotImplementedError:
workers = 1
pool = Pool(processes=workers)
result = pool.map(hard_work, range(100, 1000000))
这是使用concurrent.futures:
And this is using concurrent.futures:
def hard_work(n):
# Real hard work here
pass
if __name__ == '__main__':
from concurrent.futures import ProcessPoolExecutor, wait
from multiprocessing import cpu_count
try:
workers = cpu_count()
except NotImplementedError:
workers = 1
pool = ProcessPoolExecutor(max_workers=workers)
result = pool.map(hard_work, range(100, 1000000))
使用从 Eli Bendersky文章,这些是我的计算机上的结果(i7,64位,Arch Linux):
Using a naïve factorization function taken from this Eli Bendersky article, these are the results on my computer (i7, 64-bit, Arch Linux):
[juanlu@nebulae]─[~/Development/Python/test]
└[10:31:10] $ time python pool_multiprocessing.py
real 0m10.330s
user 1m13.430s
sys 0m0.260s
[juanlu@nebulae]─[~/Development/Python/test]
└[10:31:29] $ time python pool_futures.py
real 4m3.939s
user 6m33.297s
sys 0m54.853s
我无法使用Python分析器来配置这些,因为我得到了pickle错误。任何想法?
I cannot profile these with the Python profiler because I get pickle errors. Any ideas?
推荐答案
使用 map > concurrent.futures
,来自迭代的每个元素分别提交给执行者,它为每个调用创建一个 Future
对象。然后它返回一个迭代器,它产生由期货返回的结果。
未来
对象是相当重量级的,他们做了很多工作来允许他们提供的所有功能(如回调,
When using map
from concurrent.futures
, each element from the iterable is submitted separately to the executor, which creates a Future
object for each call. It then returns an iterator which yields the results returned by the futures.
Future
objects are rather heavyweight, they do a lot of work to allow all the features they provide (like callbacks, ability to cancel, check status, ...).
相比之下, multiprocessing.Pool
高架。它批量提交作业(减少IPC开销),并直接使用函数返回的结果。对于大批量的工作,多处理肯定是更好的选择。
Compared to that, multiprocessing.Pool
has much less overhead. It submits jobs in batches (reducing IPC overhead), and directly uses the result returned by the function. For big batches of jobs, multiprocessing is definitely the better options.
期货是伟大的,如果你想长期运行的工作,开销不那么重要,
Futures are great if you want to sumbit long running jobs where the overhead isn't that important, where you want to be notified by callback or check from time to time to see if they're done or be able to cancel the execution individually.
个人笔记
我真的想不出有多少理由使用 Executor.map
- 它不给你期货的任何功能 - 除了能够指定超时。如果你只是对结果感兴趣,你最好使用 multiprocessing.Pool
的映射函数。
I can't really think of much reasons to use Executor.map
- it doesn't give you any of the features of futures - except for the ability to specify a timeout. If you're just interested in the results, you're better off using one of multiprocessing.Pool
's map functions.
这篇关于ProcessPoolExecutor从concurrent.futures方式慢于multiprocessing.Pool的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!