来自 concurrent.futures 的 ProcessPoolExecutor 比 multiprocessing.Pool 慢 [英] ProcessPoolExecutor from concurrent.futures way slower than multiprocessing.Pool

查看:25
本文介绍了来自 concurrent.futures 的 ProcessPoolExecutor 比 multiprocessing.Pool 慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在试验 Python 3.2 中引入的新的闪亮 concurrent.futures 模块,并且我注意到,几乎使用相同的代码,使用 concurrent.futures 中的 Pool 比使用 multiprocessing.Pool.

I was experimenting with the new shiny concurrent.futures module introduced in Python 3.2, and I've noticed that, almost with identical code, using the Pool from concurrent.futures is way slower than using multiprocessing.Pool.

这是使用多处理的版本:

This is the version using multiprocessing:

def hard_work(n):
    # Real hard work here
    pass

if __name__ == '__main__':
    from multiprocessing import Pool, cpu_count

    try:
        workers = cpu_count()
    except NotImplementedError:
        workers = 1
    pool = Pool(processes=workers)
    result = pool.map(hard_work, range(100, 1000000))

这是使用concurrent.futures:

And this is using concurrent.futures:

def hard_work(n):
    # Real hard work here
    pass

if __name__ == '__main__':
    from concurrent.futures import ProcessPoolExecutor, wait
    from multiprocessing import cpu_count
    try:
        workers = cpu_count()
    except NotImplementedError:
        workers = 1
    pool = ProcessPoolExecutor(max_workers=workers)
    result = pool.map(hard_work, range(100, 1000000))

使用从 Eli Bendersky 文章,这些是我电脑上的结果(i7、64 位、Arch Linux):

Using a naïve factorization function taken from this Eli Bendersky article, these are the results on my computer (i7, 64-bit, Arch Linux):

[juanlu@nebulae]─[~/Development/Python/test]
└[10:31:10] $ time python pool_multiprocessing.py 

real    0m10.330s
user    1m13.430s
sys 0m0.260s
[juanlu@nebulae]─[~/Development/Python/test]
└[10:31:29] $ time python pool_futures.py 

real    4m3.939s
user    6m33.297s
sys 0m54.853s

我无法使用 Python 分析器分析这些,因为我遇到了 pickle 错误.有什么想法吗?

I cannot profile these with the Python profiler because I get pickle errors. Any ideas?

推荐答案

当使用 concurrent.futures 中的 map 时,每个元素都来自可迭代的 单独提交给执行器,执行器创建一个Future 对象每次通话.然后它返回一个迭代器,该迭代器产生期货返回的结果.
未来 对象是相当重量级的,它们做了很多工作来允许它们提供的所有功能(如回调、取消能力、检查状态......).

When using map from concurrent.futures, each element from the iterable is submitted separately to the executor, which creates a Future object for each call. It then returns an iterator which yields the results returned by the futures.
Future objects are rather heavyweight, they do a lot of work to allow all the features they provide (like callbacks, ability to cancel, check status, ...).

与此相比,multiprocessing.Pool 的开销要少得多.批量提交作业(减少IPC开销),直接使用函数返回的结果.对于大批量的工作,多处理绝对是更好的选择.

Compared to that, multiprocessing.Pool has much less overhead. It submits jobs in batches (reducing IPC overhead), and directly uses the result returned by the function. For big batches of jobs, multiprocessing is definitely the better options.

Future 非常棒,如果您想汇总开销不那么重要的长时间运行的作业,您希望通过回调收到通知或不时检查它们是否已完成或能够取消单独执行.

Futures are great if you want to sumbit long running jobs where the overhead isn't that important, where you want to be notified by callback or check from time to time to see if they're done or be able to cancel the execution individually.

个人笔记:

我真的想不出太多使用 Executor.map 的理由——它没有给你任何期货的特性——除了指定超时的能力.如果您只对结果感兴趣,最好使用 multiprocessing.Pool 的映射函数之一.

I can't really think of much reasons to use Executor.map - it doesn't give you any of the features of futures - except for the ability to specify a timeout. If you're just interested in the results, you're better off using one of multiprocessing.Pool's map functions.

这篇关于来自 concurrent.futures 的 ProcessPoolExecutor 比 multiprocessing.Pool 慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆