为什么 multiprocessing.Pool.map_async 中的 get() 操作需要这么长时间? [英] Why does the get() operation in multiprocessing.Pool.map_async take so long?

查看:32
本文介绍了为什么 multiprocessing.Pool.map_async 中的 get() 操作需要这么长时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将多处理导入为 mp将 numpy 导入为 np池 = mp.Pool(进程数 = 4)inp = np.linspace( 0.01, 1.99, 100 )result = pool.map_async( func, inp ) #Line1 ( func 是一些作用于输入的 Python 函数)输出 = result.get() #Line2

所以,我试图在 Python 中并行化一些代码,在 multiprocessing.Pool() 上使用 .map_async() 方法实例.

我注意到虽然
Line1 大约需要千分之一秒,
Line2 大约需要 0.3 秒.

有没有更好的方法来做到这一点,或者有办法绕过Line2造成的瓶颈,

我在这里做错了吗?

(我对此很陌生.)

解决方案

我在这里做错了吗?

不要惊慌,许多用户都这样做 - 支付的比收到的多.

这是一个常见的讲座,不是关于使用一些有前途的"语法构造函数,而是关于支付使用它的实际成本.

故事很长,效果很直接——你期待一个低垂的果实,但不得不付出巨大的过程实例化、工作包重新分配和结果收集的成本,所有这些马戏团只是为了做但是几轮 func() 调用.

<小时>

哇?
停止!
并行化被带到我面前,这将加速处理?!?

好吧,谁告诉你任何这样的(潜力)加速是免费的?

让我们进行量化,而不是衡量实际的代码执行时间,而不是情绪,对吧?

基准测试总是一个公平的举动.
它帮助我们凡人摆脱单纯的期望
并让自己进入量化证据支持的知识:

from zmq import 秒表;aClk = Stopwatch() # 这是一个方便的工具

<小时>

AS-IS 测试:

在继续之前,应该记录这一对:

<预><代码>>>>aClk.start();_ = [ func( SEQi ) for SEQi inp ];aClk.stop() # [SEQ]>>>HowMuchWillWePAY2RUN( func, 4, 100 ) # [RUN]>>>HowMuchWillWePAY2MAP(func, 4, 100) # [MAP]

这将从纯[SERIAL] [SEQ]-of-calls,未优化的 joblib.Parallel() 或任何其他,如果有人希望用任何其他工具扩展实验,就像所说的 multiprocessing.Pool() 或其他.

<小时>

测试用例 A:

意图:
以衡量{过程|的成本作业}-实例化,我们需要一个 NOP-work-package 有效载荷,它几乎不会在那里"花费任何东西但会返回"并且不需要支付任何额外的附加成本(无论是用于任何输入参数的传输还是返回任何值)

def a_NOP_FUN( aNeverConsumedPAR ):""" __doc__这个 FUN() 的意图确实是什么都不做,以便能够进行基准测试所有流程实例化附加管理费用."""经过

<小时>

因此,设置开销附加成本比较如下:

#-------------------------------------------------------<函数a_NOP_FUN[SEQ]-pure-[SERIAL] 在这个本地主机上工作了 ~ 37 .. 44 [us][MAP]-just-[CONCURENT] 工具 2536 .. 7343 [我们][RUN]-just-[CONCURENT] 工具 111162 .. 112609 [us]

<小时>

joblib.Parallel()任务处理中使用
joblib.delayed()策略:

def HowMuchWillWePAY2RUN( aFun2TEST = a_NOP_FUN, JOBS_TO_SPAWN = 4, RUNS_TO_RUN = 10 ):从 zmq 导入秒表;aClk = 秒表()尝试:aClk.start()joblib.Parallel(n_jobs = JOBS_TO_SPAWN)( joblib.delayed( aFun2TEST )( aFunPARAM )对于 ( aFunPARAM )范围内( RUNS_TO_RUN ))除了:经过最后:尝试:_ = aClk.stop()除了:_ = -1经过经过;pMASK = "CLK:: {0:_>24d} [us] @{1: >4d}-JOBs 运行{2: >6d} RUNS {3:}"打印( pMASK.format( _,JOBS_TO_SPAWN,RUNS_TO_RUN," ".join( repr( aFun2TEST ).split( " ")[:2] )))

<小时>

multiprocessing.Pool() 实例上使用轻量级
.map_async() 方法的策略:

def HowMuchWillWePAY2MAP( aFun2TEST = a_NOP_FUN, PROCESSES_TO_SPAWN = 4, RUNS_TO_RUN = 1 ):从 zmq 导入秒表;aClk = 秒表()尝试:将 numpy 导入为 np将多处理导入为 mp池 = mp.Pool(进程 = PROCESSES_TO_SPAWN)inp = np.linspace( 0.01, 1.99, 100 )aClk.start()对于 i 在 xrange( RUNS_TO_RUN ):经过;结果 = pool.map_async( aFun2TEST, inp )输出 = 结果.get()经过除了:经过最后:尝试:_ = aClk.stop()除了:_ = -1经过经过;pMASK = "CLK:: {0:_>24d} [us] @{1: >4d}-PROCs 运行{2: >6d} RUNS {3:}"打印( pMASK.format( _,PROCESSES_TO_SPAWN,RUNS_TO_RUN," ".join( repr( aFun2TEST ).split( " ")[:2] )))

<小时>

所以,
第一组痛苦和惊喜
直接计算joblib.Parallel()并发池中的实际做事成本:

 CLK:: ____117463 [us] @ 4-JOBs 运行了 10 次运行 <function a_NOP_FUNCLK:: __________________111182 [us] @ 3-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________110229 [us] @ 3-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________110095 [us] @ 3-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________111794 [us] @ 3-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________110030 [us] @ 3-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________110697 [us] @ 3-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: _________________4605843 [us] @ 123-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________336208 [us] @ 123-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________298816 [us] @ 123-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________355492 [us] @ 123-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________320837 [us] @ 123-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________308365 [us] @ 123-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________372762 [us] @ 123-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________304228 [us] @ 123-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________337537 [us] @ 123-JOBs 运行 100 RUNS <function a_NOP_FUNCLK:: __________________941775 [us] @ 123-JOBs 运行 10000 RUNS <function a_NOP_FUNCLK:: __________________987440 [us] @ 123-JOBs 运行 10000 RUNS <function a_NOP_FUNCLK:: _________________1080024 [us] @ 123-JOBs 运行 10000 次运行 <function a_NOP_FUNCLK:: _________________1108432 [us] @ 123-JOBs 运行 10000 次运行 <function a_NOP_FUN时钟:: _________________7525874 [我们] @ 123 个作业运行 100000 次运行 <函数 a_NOP_FUN

所以,这个科学上公平和严格的测试从这个最简单的案例开始,已经显示了所有相关代码执行处理设置开销的基准成本有史以来最小的> joblib.Parallel() 惩罚正弦非.

这将我们引导到一个方向,现实世界的算法确实存在 - 最好接下来在测试循环中添加一些越来越大的有效载荷"大小.

<小时>

现在,我们知道进入只是"的惩罚
-<代码>[CONCURRENT]
代码执行 - 接下来?

使用这种系统和轻量级的方法,我们可以在故事中继续前进,因为我们还需要对 { remote-job-PAR-XFER(s) 的附加成本和其他阿姆达尔定律间接影响进行基准测试) |远程作业-MEM.alloc(s) |远程作业 CPU 绑定处理 |远程作业文件IO(s) }

像这样的函数模板可能有助于重新测试(如您所见,将有很多需要重新运行,而 O/S 噪声和一些额外的工件将进入实际的使用成本模式):

<小时>

测试用例 B:

一旦我们支付了前期成本,下一个最常见的错误就是忘记了内存分配的成本.所以,让我们测试一下:

def a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR( aNeverConsumedPAR, SIZE1D = 1000 ):""" __doc__这个 FUN() 的目的是什么都不做一个 MEM 分配以便能够进行基准测试所有流程实例化附加管理费用."""import numpy as np # 是的,延迟导入,libs 延迟导入aMemALLOC = np.zeros( ( SIZE1D, # 以便设置SIZE1D, # 逼真的天花板SIZE1D, # 作为大数据"有多大SIZE1D # 可能确实长成),dtype = np.float64,订单 = 'F') # .ALLOC + .SETaMemALLOC[2,3,4,5] = 8.7654321 #.SETaMemALLOC[3,3,4,5] = 1.2345678 #.SET返回 aMemALLOC[2:3,3,4,5]

如果您的平台将停止分配请求的内存块,我们会遇到另一种问题(如果尝试在物理资源中并行,则使用一类隐藏的玻璃天花板不可知的方式).可以编辑 SIZE1D 缩放比例,以至少适合平台 RAM 寻址/大小调整功能,但是,我们对实际问题计算的性能范围仍然非常感兴趣:

<预><代码>>>>HowMuchWillWePAY2RUN( a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR, 200, 1000 )

可能会屈服
支付成本,介于 0.1 [s]+9 [s] (!!)
只是为了什么都不做,但现在也不要忘记一些现实的 MEM 分配附加成本那里"

CLK:: ____116310 [us] @ 4-JOBs 运行 10 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________120054 [us] @ 4-JOBs 运行 10 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________1294​​41 [us] @ 10-JOBs 运行 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________123721 [us] @ 10-JOBs 运行 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________127126 [us] @ 10-JOBs 运行 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________124028 [us] @ 10-JOBs 运行 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________305234 [us] @ 100-JOBs 运行 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________243386 [us] @ 100-JOBs 运行 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________241410 [us] @ 100-JOBs 运行 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________267275 [us] @ 100-JOBs 运行 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________244207 [us] @ 100-JOBs 运行 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________653879 [us] @ 100-JOBs 运行 1000 次运行 <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________405149 [us] @ 100-JOBs 运行 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________351182 [us] @ 100-JOBs 运行 1000 次运行 <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________362030 [us] @ 100-JOBs 运行 1000 次运行 <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: _________________9325428 [us] @ 200-JOBs 运行 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________680429 [us] @ 200-JOBs 运行 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________533559 [us] @ 200-JOBs 运行 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: _________________1125190 [us] @ 200-JOBs 运行 1000 次运行 <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________591109 [us] @ 200-JOBs 运行 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR

测试用例 C:

阅读这篇文章

的尾部部分

测试用例 D:

阅读这篇文章

的尾部部分<小时>

结语:

对于每一个承诺",最好的下一步是首先交叉验证实际的代码执行成本,然后再开始任何代码重新设计.真实世界平台的附加成本总和可能会破坏任何预期的加速,即使最初的、天真的阿姆达尔定律可能已经产生了一些预期的加速效果.

正如 Walter E. Deming 先生多次表示的那样,如果没有 DATA,我们只会让自己只剩下意见.

<小时>

附赠部分:
读到这里,人们可能已经发现,#Line2 本身没有任何缺点"或错误",但仔细的设计实践会显示出更好的语法- 构造函数,花费更少以实现更多(因为代码执行平台上的实际资源(CPU、MEM、IO、O/S)允许).其他任何事情都与盲目告诉《财富》没有本质区别.

import multiprocessing as mp
import numpy as np

pool   = mp.Pool( processes = 4 )
inp    = np.linspace( 0.01, 1.99, 100 )
result = pool.map_async( func, inp ) #Line1 ( func is some Python function which acts on input )
output = result.get()                #Line2

So, I was trying to parallelize some code in Python, using a .map_async() method on a multiprocessing.Pool() instance.

I noticed that while
Line1 takes around a thousandth of a second,
Line2 takes about .3 seconds.

Is there a better way to do this or a way to get around the bottleneck caused by Line2,
or
am I doing something wrong here?

( I am rather new to this. )

解决方案

Am I doing something wrong here?

Do not panic, many users do the very same - Paid more than received.

This is a common lecture not on using some "promising" syntax-constructor, but on paying the actual costs for using it.

The story is long, the effect was straightforward - you expected a low hanging fruit, but had to pay an immense cost of process-instantiation, work-package re-distribution and for collection of results, all that circus just for doing but a few rounds of func()-calls.


Wow?
Stop!
Parallelisation was brought to me that will SPEEDUP processing?!?

Well, who told you that any such ( potential ) speedup is for free?

Let's be quantitative and rather measure the actual code-execution time, instead of emotions, right?

Benchmarking is always a fair move.
It helps us, mortals, to escape from just expectations
and get ourselves into quantitative records-of-evidence supported knowledge:

from zmq import Stopwatch; aClk = Stopwatch() # this is a handy tool to do so


AS-IS test:

Before moving forwards, one ought record this pair:

>>> aClk.start(); _ = [   func( SEQi ) for SEQi in inp ]; aClk.stop() # [SEQ] 
>>> HowMuchWillWePAY2RUN( func, 4, 100 )                              # [RUN]
>>> HowMuchWillWePAY2MAP( func, 4, 100 )                              # [MAP]

This will set the span among the performance envelopes from a pure-[SERIAL] [SEQ]-of-calls, to an un-optimised joblib.Parallel() or any other, if one wishes to extend the experiment with any other tools, like a said multiprocessing.Pool() or other.


Test-case A:

Intent:
so as to measure the cost of a { process | job }-instantiation, we need a NOP-work-package payload, that will spend almost nothing "there" but return "back" and will not require to pay any additional add-on costs ( be it for any input parameters' transmissions or returning any value )

def a_NOP_FUN( aNeverConsumedPAR ):
    """                                                 __doc__
    The intent of this FUN() is indeed to do nothing at all,
                             so as to be able to benchmark
                             all the process-instantiation
                             add-on overhead costs.
    """
    pass


So, the setup-overhead add-on costs comparison is here:

#-------------------------------------------------------<function a_NOP_FUN
[SEQ]-pure-[SERIAL] worked within ~   37 ..     44 [us] on this localhost
[MAP]-just-[CONCURENT] tool         2536 ..   7343 [us]
[RUN]-just-[CONCURENT] tool       111162 .. 112609 [us]


Using a strategy of
joblib.delayed() on joblib.Parallel() task-processing:

def HowMuchWillWePAY2RUN( aFun2TEST = a_NOP_FUN, JOBS_TO_SPAWN = 4, RUNS_TO_RUN = 10 ):
    from zmq import Stopwatch; aClk = Stopwatch()
    try:
         aClk.start()
         joblib.Parallel(  n_jobs = JOBS_TO_SPAWN
                          )( joblib.delayed( aFun2TEST )
                                           ( aFunPARAM )
                                       for ( aFunPARAM )
                                       in  range( RUNS_TO_RUN )
                             )
    except:
         pass
    finally:
         try:
             _ = aClk.stop()
         except:
             _ = -1
             pass
    pass;  pMASK = "CLK:: {0:_>24d} [us] @{1: >4d}-JOBs ran{2: >6d} RUNS {3:}"
    print( pMASK.format( _,
                         JOBS_TO_SPAWN,
                         RUNS_TO_RUN,
                         " ".join( repr( aFun2TEST ).split( " ")[:2] )
                         )
            )


Using a strategy of a lightweight
.map_async() method on a multiprocessing.Pool() instance:

def HowMuchWillWePAY2MAP( aFun2TEST = a_NOP_FUN, PROCESSES_TO_SPAWN = 4, RUNS_TO_RUN = 1 ):
    from zmq import Stopwatch; aClk = Stopwatch()
    try:
         import numpy           as np
         import multiprocessing as mp

         pool = mp.Pool( processes = PROCESSES_TO_SPAWN )
         inp  = np.linspace( 0.01, 1.99, 100 )

         aClk.start()
         for i in xrange( RUNS_TO_RUN ):
             pass;    result = pool.map_async( aFun2TEST, inp )
             output = result.get()
         pass
    except:
         pass
    finally:
         try:
             _ = aClk.stop()
         except:
             _ = -1
             pass
    pass;  pMASK = "CLK:: {0:_>24d} [us] @{1: >4d}-PROCs ran{2: >6d} RUNS {3:}"
    print( pMASK.format( _,
                         PROCESSES_TO_SPAWN,
                         RUNS_TO_RUN,
                         " ".join( repr( aFun2TEST ).split( " ")[:2] )
                         )
            )


So,
the first set of pain and surprises
comes straight at the actual cost-of-doing-NOTHING in a concurrent pool of joblib.Parallel():

 CLK:: __________________117463 [us] @   4-JOBs ran    10 RUNS <function a_NOP_FUN
 CLK:: __________________111182 [us] @   3-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________110229 [us] @   3-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________110095 [us] @   3-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________111794 [us] @   3-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________110030 [us] @   3-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________110697 [us] @   3-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: _________________4605843 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________336208 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________298816 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________355492 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________320837 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________308365 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________372762 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________304228 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________337537 [us] @ 123-JOBs ran   100 RUNS <function a_NOP_FUN
 CLK:: __________________941775 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN
 CLK:: __________________987440 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN
 CLK:: _________________1080024 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN
 CLK:: _________________1108432 [us] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN
 CLK:: _________________7525874 [us] @ 123-JOBs ran100000 RUNS <function a_NOP_FUN

So, this scientifically fair and rigorous test started from this simplest ever case, already showing the benchmarked costs of all the associated code-execution processing setup-overheads a smallest ever joblib.Parallel() penalty sine-qua-non.

This forwards us into a direction, where real-world algorithms do live - best with next adding some larger and larger "payload"-sizes into the testing loop.


Now, we know the penalty
for going into a "just"-[CONCURRENT] code-execution - and next?

Using this systematic and lightweight approach, we may go forwards in the story, as we will need to also benchmark the add-on costs and other Amdahl's Law indirect effects of { remote-job-PAR-XFER(s) | remote-job-MEM.alloc(s) | remote-job-CPU-bound-processing | remote-job-fileIO(s) }

A function template like this may help in re-testing ( as you see there will be a lot to re-run, while the O/S noise and some additional artifacts will step into the actual cost-of-use patterns ):


Test-case B:

Once we have paid the up-front cost, the next most common mistake is to forget the costs of memory allocations. So, lets test it:

def a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR( aNeverConsumedPAR, SIZE1D = 1000 ):
    """                                                 __doc__
    The intent of this FUN() is to do nothing but
                             a MEM-allocation
                             so as to be able to benchmark
                             all the process-instantiation
                             add-on overhead costs.
    """
    import numpy as np              # yes, deferred import, libs do defer imports
    aMemALLOC = np.zeros( ( SIZE1D, #       so as to set
                            SIZE1D, #       realistic ceilings
                            SIZE1D, #       as how big the "Big Data"
                            SIZE1D  #       may indeed grow into
                            ),
                          dtype = np.float64,
                          order = 'F'
                          )         # .ALLOC + .SET
    aMemALLOC[2,3,4,5] = 8.7654321  # .SET
    aMemALLOC[3,3,4,5] = 1.2345678  # .SET

    return aMemALLOC[2:3,3,4,5]

In case your platform will stop to be able to allocate the requested memory-blocks, there we head-bang into another kind of problems ( with a class of hidden glass-ceilings if trying to go-parallel in a physical-resources agnostic manner ). One may edit the SIZE1D scaling, so as to at least fit into the platform RAM addressing / sizing capabilites, yet, the performance envelopes of the real-world problem computing are still of our great interest here:

>>> HowMuchWillWePAY2RUN( a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR, 200, 1000 )

may yield
a cost-to-pay, being anything between 0.1 [s] and +9 [s] (!!)
just for doing STILL NOTHING, but now also without forgetting about some realistic MEM-allocation add-on costs "there"

CLK:: __________________116310 [us] @   4-JOBs ran    10 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________120054 [us] @   4-JOBs ran    10 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________129441 [us] @  10-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________123721 [us] @  10-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________127126 [us] @  10-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________124028 [us] @  10-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________305234 [us] @ 100-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________243386 [us] @ 100-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________241410 [us] @ 100-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________267275 [us] @ 100-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________244207 [us] @ 100-JOBs ran   100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________653879 [us] @ 100-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________405149 [us] @ 100-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________351182 [us] @ 100-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________362030 [us] @ 100-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: _________________9325428 [us] @ 200-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________680429 [us] @ 200-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________533559 [us] @ 200-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: _________________1125190 [us] @ 200-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR
CLK:: __________________591109 [us] @ 200-JOBs ran  1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR

Test-case C:

kindly read the tail sections of this post

Test-case D:

kindly read the tail sections of this post


Epilogue:

For each and every "promise", the fair best next step is first to cross-validate the actual code-execution costs, before starting any code re-engineering. The sum of real-world platform's add-on costs may devastate any expected speedups, even if the original, overhead-naive Amdahl's Law might have created some expected speedup-effects.

As Mr. Walter E. Deming has expressed many times, without DATA we make ourselves left to just OPINIONS.


A bonus part:
having read as far as here, one might already found, that there is not any kind of "drawback" or "error" in the #Line2 per se, but the carefull design practice will show any better syntax-constructor, that spend less to achieve more ( as actual resources ( CPU, MEM, IOs, O/S ) permit on the code-execution platform ). Anything else is not principally different from a just blind telling Fortune.

这篇关于为什么 multiprocessing.Pool.map_async 中的 get() 操作需要这么长时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆