为什么joblib.Parallel()比非并行计算花费更多的时间? Parallel()是否应该比非并行计算运行得更快? [英] Why does joblib.Parallel() take much more time than a non-paralleled computation? Shouldn't Parallel() run faster than a non-paralleled computation?
问题描述
一个 joblib
模块提供一个简单的帮助程序类,可以使用多处理来并行编写for循环.
A joblib
module provides a simple helper class to write parallel for loops using multiprocessing.
此代码使用列表理解来完成这项工作:
This code uses a list comprehension to do the job :
import time
from math import sqrt
from joblib import Parallel, delayed
start_t = time.time()
list_comprehension = [sqrt(i ** 2) for i in range(1000000)]
print('list comprehension: {}s'.format(time.time() - start_t))
大约需要0.51s
list comprehension: 0.5140271186828613s
此代码使用 joblib.Parallel()
构造函数:
This code uses joblib.Parallel()
constructor :
start_t = time.time()
list_from_parallel = Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(1000000))
print('Parallel: {}s'.format(time.time() - start_t))
大约需要31秒
Parallel: 31.3990638256073s
那是为什么? Parallel()
是否应该比非并行计算更快?
Why is that? Shouldn't Parallel()
become faster than a non-paralleled computation?
这是 cpuinfo
的一部分:
Here is part of the cpuinfo
:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU @ 2.20GHz
stepping : 0
microcode : 0x1
cpu MHz : 2200.000
cache size : 56320 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
推荐答案
Q :
Parallel()
不会比非并行计算更快吗?
Q : Shouldn't
Parallel()
become faster than a non-paralleled computation?
嗯,这取决于,这在很大程度上取决于环境(无论是joblib.Parallel()
还是其他方式).
Well, that depends, depends a lot on circumstances ( be it a joblib.Parallel()
or other way ).
没有任何免费的好处 (自1917年以来,所有这些 失败 的承诺都无法兑现... )
此外,
很容易发生
支付更多方式 (关于启动多重处理的生成过程)
比您收到的回复 (在原始工作流程中预期的提速效果) ...因此必须予以应有的照顾
Plus,
it is very easy to happen to
pay way more ( on spawning processes for starting a multiprocessing )
than you receive back ( speedup expected over an original workflow ) ... so a due care is a must
重新审视阿姆达尔定律以及对流程调度效果的批评(加速实现形式)重新组织流程,并至少部分地使用并行流程计划.
Revisit the Amdahl's law revision and criticism about process-scheduling effects (speedup achieved form reorganisation of process-flows and using, at least in some part, a parallel process-scheduling).
最初的Amdahl公式在所谓的附加 费用" 上并没有明确说明,流量,这些流量不在原始的[SERIAL]
工作流程的预算之内.
The original Amdahl's formulation was not explicit on so called add-on "costs" one has to pay for going into parallel work-flows, that are not in the budget of the original, pure-[SERIAL]
flow-of-work.
1)进程实例在python中总是很昂贵,因为它首先必须复制尽可能多的副本(O/S驱动的RAM分配大小为 n_jobs
(2)-副本+ O/S驱动的复制主python会话的RAM映像)(基于线程的多处理会带来负面的提速,因为仍然存在工作步骤的GIL锁重新[SERIAL]
化在所有生成的线程中,因此您一无所获,尽管您为生成的每个GIL-ackquire/GIL-release附加步骤跳舞步骤支付了巨大的附加成本-对于计算密集型任务而言,这是一个糟糕的反模式,它可能帮助掩盖了某些与I/O相关的延迟的情况,但绝对不是计算密集型工作负载的情况
1) Process-instantiations was always expensive in python, as it first has to replicate as many copies (O/S-driven RAM-allocations sized for n_jobs
(2)-copies + O/S-driven copying the RAM-image of the main python session) ( Thread-based multiprocessing does negative speedup, as there still remains GIL-lock re-[SERIAL]
-isation of work-steps among all spawned threads, so you get nothing, while you have paid immense add-on costs for spawning + for each add-on GIL-ackquire/GIL-release step-dancing step - an awful antipattern for compute-intensive tasks, it may help mask some cases of I/O-related latencies, but definitely not a case for computing intensive workloads )
2) 参数传输的附加费用-您必须将一些数据从主流程转移到新流程.这需要花费附加时间,而您必须支付这些附加费用,而这些费用在原始的纯[SERIAL]
工作流程中是没有的.
2) Add-on costs for parameters' transfer - you have to move some data from main process towards the new ones. It costs add-on time and you have to pay this add-on cost, that is not present in the original, pure-[SERIAL]
workflow.
3) 结果退货转移的附加费用-您必须将一些数据从新数据移回原始流程(主流程).这需要花费附加时间,而您必须支付这些附加费用,而这些费用在原始的纯[SERIAL]
工作流程中是没有的.
3) Add-on costs for results return transfer - you have to move some data from the new ones back to the originating (main) process. It costs add-on time and you have to pay this add-on cost, that is not present in the original, pure-[SERIAL]
workflow.
4) 任何数据交换的附加费用(最好避免在并行工作流程中尝试使用它的任何原因-为什么? a)它会阻塞+ b),这很昂贵,您必须付出更多的附加费用才能获得进一步的服务,而您不必在纯[SERIAL]
原始工作流程中支付这笔费用).
4) Add-on costs for any data interchange ( better avoid any tempting to use this in parallel workflows - why? a) It blocks + b) It is expensive and you have to pay even more add-on costs for getting any further, which you do not pay in a pure-[SERIAL]
original workflow ).
Q :为什么
joblib.Parallel()
比非并行计算花费更多的时间?
简单地说,因为您必须付出更多的方式来发起整个精心策划的马戏团,而不是从这种并行的工作流程组织那里得到的回报(math.sqrt( <int> )
中的工作量太小,无法证明相对庞大的工作是合理的产生原始python-(main)-session的2个完整副本+舞蹈的所有编排以仅从(main)-发送每个(<int>
)-并检索返回的每个结果的成本( <float>
)-从(joblib.Parallel()-process)-回到(main).
Simply, because you have to pay way, way more to launch the whole orchestrated circus, than you will receive back from such parallel work-flow organisation ( too small amount of work in math.sqrt( <int> )
to ever justify the relative-immense costs of spawning 2-full-copies of the original python-(main)-session + all the orchestration of dances to send just each and every ( <int>
)-from-(main)-there and retrieving a returning each resulting ( <float>
)-from-(joblib.Parallel()-process)-back-to-(main).
您的原始基准测试时间可以对累积成本进行充分的比较,以得出相同的结果:
Your raw benchmarking times provide sufficient comparison of the accumulated costs to do the same result:
[SERIAL]-<iterator> feeding a [SERIAL]-processing storing into list[]: 0.51 [s]
[SERIAL]-<iterator> feeding [PARALLEL]-processing storing into list[]: 31.39 [s]
原始估计说,浪费了大约30.9秒是"浪费" ,只是因为忘记了一个人总是要付出的额外费用而完成了同样(少量)的工作支付.
Raw estimate says about 30.9 second were "wasted" to do the same (small) amount of work just by forgetting about the add-on costs one has always to pay.
基准,基准,基准实际代码...(原型)
Benchmark, benchmark, benchmark the actual code ... (prototype)
如果有兴趣对这些成本进行基准测试-在 [us]
中花费多长时间(即,在开始进行任何有用的工作之前,您必须支付多少钱)进行1),2)或3),发布了基准测试模板进行测试并在自己的平台上验证这些主要成本,然后才能决定什么是最低工作包,这可以证明这些不可避免的支出是合理的,并且可以使积极的"加速幅度更大(最好更大) >> 1.0000
与纯[SERIAL]
原始版本相比.
If interested in benchmarking these costs - how long does it take in [us]
( i.e. How Much You Have To Pay, before any useful work even starts ) to do 1), 2) or 3), there were posted benchmarking templates to test and validate these principal costs on one's own platform, before being able to decide, what is a minimum work-package, that can justify these un-avoidable expenses and yield a "positive" speedup any greater, ( best a lot greater ) >> 1.0000
when compared to the pure-[SERIAL]
original.
这篇关于为什么joblib.Parallel()比非并行计算花费更多的时间? Parallel()是否应该比非并行计算运行得更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!