为什么整个计算(在这种情况下为benchmking_f)花费的时间比连续方法花费的时间这么长? [英] Why is the time a piece of the whole computaion (benchmking_f in this case) takes in parallel so much longer than the one in sequential approach?

查看:90
本文介绍了为什么整个计算(在这种情况下为benchmking_f)花费的时间比连续方法花费的时间这么长?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试比较Python中的顺序计算和并行计算.

I am trying to compare sequential computation and parallel computation in Python.

这是基准功能.

def benchmking_f(n=0):
    import time
    items = range(int(10**(6+n)))

    def f2(x):return x*x

    start = time.time()
    sum_squared = 0
    for i in items:
        sum_squared += f2(i)
    return time.time() - start

此顺序计算

problem_size = 2

import time
start = time.time()
tlist = []
for i in range(5):
    tlist.append(benchmking_f(problem_size))
print('for loop took {}s'.format(time.time() - start))
print('each iterate took')
print(tlist)

花了大约70来完成工作;每次迭代都花了 [14.209498167037964、13.92169737815857、13.949078798294067、13.94432258605957、14.004642486572266]

took about 70s to finish the job; each iterate took [14.209498167037964, 13.92169737815857, 13.949078798294067, 13.94432258605957, 14.004642486572266]

这种并行方法

problem_size = 2

import itertools
import multiprocessing
start = time.time()
pool = multiprocessing.Pool(5)
tlist = list(pool.map(benchmking_f, itertools.repeat(problem_size, 5)))
print('pool.map took {}s'.format(time.time() - start))
print('each iterate took')
print(tlist)

大约花费42.45秒;每次迭代都花了 [41.17476940155029、41.92032074928284、41.50966739654541、41.348535776138306、41.06284761428833]

took about 42.45s; each iterate took [41.17476940155029, 41.92032074928284, 41.50966739654541, 41.348535776138306, 41.06284761428833]

整个计算的一部分(在本例中为benchmking_f)大约耗时14s,并行耗时42.45s

A piece of the whole computation (benchmking_f in this case) took about 14s in sequential and 42.45s in parallel

那是为什么?

注意: 我没有问总时间.我问的时间是整个计算的一部分,它在for循环中进行一次迭代,并在一个进程/线程中并行进行.

Note: I am not asking the total time. I am asking the time that A piece of the whole computation, which takes on one iteration in for loop, and one process/thread in parallel.

1-iter benchmking_f需要.

1-iter benchmking_f takes.

推荐答案

您拥有多少个物理(非逻辑)内核?您正在尝试同时运行该函数的5个副本,该函数在运行时会占用一个核心的100%,并且除非您拥有至少5个物理核心,否则它们将互相搏斗以维持生命周期

How many physical (not logical) cores do you have? You're trying to run 5 copies of the function simultaneously, the function takes 100% of one core for as long as it runs, and unless you have at least 5 physical cores they're going to fight each other tooth and nail for cycles.

我有4个物理核心,但也想将我的机器用于其他用途,因此将Pool(5)减少为Pool(3). 然后每次迭代的时机大致相同.

I have 4 physical cores, but want to use my machine for other things too, so reduced your Pool(5) with Pool(3). Then the per-iterate timings were about the same either way.

假设您有一个任务需要100%的CPU占用T秒.如果要同时运行该任务的S个副本,则总共需要T*S cpu-seconds.如果您有C个完全免费的物理内核可以使用,则最多min(C, S)个内核可以同时在聚合上工作,因此大致估算所需时间为:

Suppose you have a task that nails 100% of a CPU for T seconds. If you want to run S copies of that task simultaneously, that requires T*S cpu-seconds in total. If you have C entirely free physical cores to throw at it, at most min(C, S) cores can be working on the aggregate simultaneously, so to a first approximation the time needed will be:

T*S / min(C, S)

正如另一封回复所述,当您运行的进程多于内核时,操作系统会在整个过程中循环运行这些进程,以使它们全部花费相同的挂钟时间(在每个进程中一定数量的时间)除了等待操作系统让它再次运行一段时间之外,什么都没做.

As another reply said, when you have more processes running than cores, the OS cycles through the processes for the duration, acting to make them all take about the same amount of wall-clock time (during some amount of which each process is doing nothing at all except waiting for the OS to let it run again for a while).

我猜您有2个物理核心.以您的示例为例,T大约是14秒,而S是5秒,所以如果您有C=2个核心可以解决

I'm guessing you have 2 physical cores. For your example, T is about 14 seconds, and S is 5, so if you had C=2 cores that works out to

14*5 / min(2, 5) = 14*5/2 = 35

秒.您实际上看到的数值接近41.部分原因是开销,但是您的计算机似乎同时也在执行其他工作,因此您的测试运行没有获得2个内核的100%.

seconds. You're actually seeing something closer to 41. Overheads account for part of that, but seems likely your machine was also doing other work at the same time, so your test run didn't get 100% of the 2 cores.

这篇关于为什么整个计算(在这种情况下为benchmking_f)花费的时间比连续方法花费的时间这么长?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆