为什么joblib.Parallel()比非并行计算花费更多的时间? Parallel()是否应该比非并行计算运行得更快? [英] Why does joblib.Parallel() take much more time than a non-paralleled computation? Shouldn't Parallel() run faster than a non-paralleled computation?

查看:160
本文介绍了为什么joblib.Parallel()比非并行计算花费更多的时间? Parallel()是否应该比非并行计算运行得更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个 joblib 模块提供一个简单的帮助程序类,可以使用多处理来并行编写for循环.

A joblib module provides a simple helper class to write parallel for loops using multiprocessing.

此代码使用列表理解来完成这项工作:

This code uses a list comprehension to do the job :

import time
from math import sqrt
from joblib import Parallel, delayed

start_t = time.time()
list_comprehension = [sqrt(i ** 2) for i in range(1000000)]
print('list comprehension: {}s'.format(time.time() - start_t))

大约需要0.51s

list comprehension: 0.5140271186828613s

此代码使用 joblib.Parallel() 构造函数:

This code uses joblib.Parallel() constructor :

start_t = time.time()
list_from_parallel = Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(1000000))
print('Parallel: {}s'.format(time.time() - start_t))

大约需要31秒

Parallel: 31.3990638256073s

那是为什么? Parallel()是否应该比非并行计算更快?

Why is that? Shouldn't Parallel() become faster than a non-paralleled computation?

这是 cpuinfo 的一部分:

Here is part of the cpuinfo :

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 79
model name      : Intel(R) Xeon(R) CPU @ 2.20GHz
stepping        : 0
microcode       : 0x1
cpu MHz         : 2200.000
cache size      : 56320 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes

推荐答案

Q : Parallel()不会比非并行计算更快吗?

Q : Shouldn't Parallel() become faster than a non-paralleled computation?

嗯,这取决于,这在很大程度上取决于环境(无论是joblib.Parallel()还是其他方式).

Well, that depends, depends a lot on circumstances ( be it a joblib.Parallel() or other way ).

没有任何免费的好处 (自1917年以来,所有这些 失败 的承诺都无法兑现... )

此外,
很容易发生
支付更多方式 (关于启动多重处理的生成过程)
比您收到的回复 (在原始工作流程中预期的提速效果) ...因此必须予以应有的照顾

Plus,
it is very easy to happen to
pay way more ( on spawning processes for starting a multiprocessing )
than you receive back ( speedup expected over an original workflow ) ... so a due care is a must

重新审视阿姆达尔定律以及对流程调度效果的批评(加速实现形式)重新组织流程,并至少部分地使用并行流程计划.

Revisit the Amdahl's law revision and criticism about process-scheduling effects (speedup achieved form reorganisation of process-flows and using, at least in some part, a parallel process-scheduling).

最初的Amdahl公式在所谓的附加 费用" 上并没有明确说明,流量,这些流量不在原始的[SERIAL]工作流程的预算之内.

The original Amdahl's formulation was not explicit on so called add-on "costs" one has to pay for going into parallel work-flows, that are not in the budget of the original, pure-[SERIAL] flow-of-work.

1)进程实例在python中总是很昂贵,因为它首先必须复制尽可能多的副本(O/S驱动的RAM分配大小为 n_jobs (2)-副本+ O/S驱动的复制主python会话的RAM映像)(基于线程的多处理会带来负面的提速,因为仍然存在工作步骤的GIL锁重新[SERIAL]化在所有生成的线程中,因此您一无所获,尽管您为生成的每个GIL-ackquire/GIL-release附加步骤跳舞步骤支付了巨大的附加成本-对于计算密集型任务而言,这是一个糟糕的反模式,它可能帮助掩盖了某些与I/O相关的延迟的情况,但绝对不是计算密集型工作负载的情况

1) Process-instantiations was always expensive in python, as it first has to replicate as many copies (O/S-driven RAM-allocations sized for n_jobs(2)-copies + O/S-driven copying the RAM-image of the main python session) ( Thread-based multiprocessing does negative speedup, as there still remains GIL-lock re-[SERIAL]-isation of work-steps among all spawned threads, so you get nothing, while you have paid immense add-on costs for spawning + for each add-on GIL-ackquire/GIL-release step-dancing step - an awful antipattern for compute-intensive tasks, it may help mask some cases of I/O-related latencies, but definitely not a case for computing intensive workloads )

2) 参数传输的附加费用-您必须将一些数据从主流程转移到新流程.这需要花费附加时间,而您必须支付这些附加费用,而这些费用在原始的纯[SERIAL]工作流程中是没有的.

2) Add-on costs for parameters' transfer - you have to move some data from main process towards the new ones. It costs add-on time and you have to pay this add-on cost, that is not present in the original, pure-[SERIAL] workflow.

3) 结果退货转移的附加费用-您必须将一些数据从新数据移回原始流程(主流程).这需要花费附加时间,而您必须支付这些附加费用,而这些费用在原始的纯[SERIAL]工作流程中是没有的.

3) Add-on costs for results return transfer - you have to move some data from the new ones back to the originating (main) process. It costs add-on time and you have to pay this add-on cost, that is not present in the original, pure-[SERIAL] workflow.

4) 任何数据交换的附加费用(最好避免在并行工作流程中尝试使用它的任何原因-为什么? a)它会阻塞+ b),这很昂贵,您必须付出更多的附加费用才能获得进一步的服务,而您不必在纯[SERIAL]原始工作流程中支付这笔费用).

4) Add-on costs for any data interchange ( better avoid any tempting to use this in parallel workflows - why? a) It blocks + b) It is expensive and you have to pay even more add-on costs for getting any further, which you do not pay in a pure-[SERIAL] original workflow ).

Q :为什么 joblib.Parallel() 比非并行计算花费更多的时间?

简单地说,因为您必须付出更多的方式来发起整个精心策划的马戏团,而不是从这种并行的工作流程组织那里得到的回报(math.sqrt( <int> )中的工作量太小,无法证明相对庞大的工作是合理的产生原始python-(main)-session的2个完整副本+舞蹈的所有编排以仅从(main)-发送每个(<int>)-并检索返回的每个结果的成本( <float>)-从(joblib.Parallel()-process)-回到(main).

Simply, because you have to pay way, way more to launch the whole orchestrated circus, than you will receive back from such parallel work-flow organisation ( too small amount of work in math.sqrt( <int> ) to ever justify the relative-immense costs of spawning 2-full-copies of the original python-(main)-session + all the orchestration of dances to send just each and every ( <int> )-from-(main)-there and retrieving a returning each resulting ( <float> )-from-(joblib.Parallel()-process)-back-to-(main).

您的原始基准测试时间可以对累积成本进行充分的比较,以得出相同的结果:

Your raw benchmarking times provide sufficient comparison of the accumulated costs to do the same result:

[SERIAL]-<iterator> feeding a [SERIAL]-processing storing into list[]:  0.51 [s]
[SERIAL]-<iterator> feeding [PARALLEL]-processing storing into list[]: 31.39 [s]

原始估计说,浪费了大约30.9秒是"浪费" ,只是因为忘记了一个人总是要付出的额外费用而完成了同样(少量)的工作支付.

Raw estimate says about 30.9 second were "wasted" to do the same (small) amount of work just by forgetting about the add-on costs one has always to pay.

基准,基准,基准实际代码...(原型)

Benchmark, benchmark, benchmark the actual code ... (prototype)

如果有兴趣对这些成本进行基准测试-在 [us]中花费多长时间(即,在开始进行任何有用的工作之前,您必须支付多少钱)进行1),2)或3),发布了基准测试模板进行测试并在自己的平台上验证这些主要成本,然后才能决定什么是最低工作包,这可以证明这些不可避免的支出是合理的,并且可以使积极的"加速幅度更大(最好更大) >> 1.0000与纯[SERIAL]原始版本相比.

If interested in benchmarking these costs - how long does it take in [us] ( i.e. How Much You Have To Pay, before any useful work even starts ) to do 1), 2) or 3), there were posted benchmarking templates to test and validate these principal costs on one's own platform, before being able to decide, what is a minimum work-package, that can justify these un-avoidable expenses and yield a "positive" speedup any greater, ( best a lot greater ) >> 1.0000 when compared to the pure-[SERIAL] original.

这篇关于为什么joblib.Parallel()比非并行计算花费更多的时间? Parallel()是否应该比非并行计算运行得更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆