测试python多处理:由于开销而导致速度较低? [英] testing python multiprocessing: low speed because of overhead?

查看:65
本文介绍了测试python多处理:由于开销而导致速度较低?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试学习python(2.7)中的multiprocessing.我的CPU有4个核心.在下面的代码中,我测试了同一基本指令的并行Vs串行执行速度.

I'm trying to learn about multiprocessing in python (2.7). My CPU has 4 cores. In the following code I test speed of parallel Vs serial execution of the same basic instruction.

我发现使用4个内核所花费的时间仅为0.67,而仅使用一个内核所花费的时间却很幼稚,我希望大约为0.25.

I find that the time taken using the 4 cores is only 0.67 the one taken by only one core, while naively I'd expect ~0.25.

是开销的原因吗?它从何而来?这四个过程不是独立的吗?

Is overhead the reason? where does it come from? Are not the 4 processes independent?

我还尝试了pool.mappool.map_async,在速度方面的结果非常相似.

I also tried pool.map and pool.map_async, with very similar results in terms of speed.

from multiprocessing import Process
import time

def my_process(a):
    for i in range(0,a[1]):
        j=0
        while j<10000:
            j = j+1
    print(a,j)

if __name__ == '__main__':
    # arguments to pass:
    a = ((0,2000),(1,2000),(2,2000),(3,2000))

    # --- 1) parallel processes:
    # 4 cores go up to 100% each here
    t0 = time.time()
    proc1 = Process(target=my_process, args=(a[0],))
    proc2 = Process(target=my_process, args=(a[1],))
    proc3 = Process(target=my_process, args=(a[2],))
    proc4 = Process(target=my_process, args=(a[3],))
    proc1.start(); proc2.start(); proc3.start(); proc4.start()
    proc1.join() ; proc2.join() ; proc3.join() ; proc4.join()
    dt_parallel = time.time()-t0
    print("parallel : " + str(dt_parallel))

    # --- 2) serial process :
    # 1 core only goes up to 100%
    t0 = time.time()
    for k in a:
        my_process(k)
    dt_serial = time.time()-t0
    print("serial : " + str(dt_serial))

    print("t_par / t_ser = " + str(dt_parallel/dt_serial))

编辑,我的PC实际上有2个物理核心(每个插槽2 = 2核心* 1个插槽,来自lscpu [感谢@goncalopp]).如果仅使用前两个进程运行上述脚本,则比率为0.62,与使用3或4个进程获得的比率没有什么不同.我想要比这快起来并不容易.

EDIT my PC has actually 2 physical cores (2 = 2 cores per socket * 1 sockets, from lscpu [thanks @goncalopp]). If I run the above script with only the first 2 processes I get a ratio of 0.62, not that different to the one obtained with 3 or 4 processes. I guess it won't be easy to go faster than that.

我在另一台具有lscpu的PC上进行了测试:CPU:32,每个内核的线程:2,每个插槽的内核:8,插槽:2,我得到一个比率为0.34,类似于@dano.

I tested on another PC with lscpu: CPU(s):32, Thread(s) per core: 2, core(s) per socket: 8, Socket(s): 2, and I get a ratio of 0.34, similar to @dano.

感谢您的帮助

推荐答案

是的,此可能与开销有关,包括:

Yes, this may be related to overhead, including:

  • 创建和启动流程
  • 将函数和参数传递给他们
  • 等待进程终止

如果您的计算机上确实有4个物理核心(而不是2个具有超线程或类似功能的核心),您应该看到该比率接近于较大输入所期望的比率,如chepner所说.如果您只有2个物理核心,则无法获得比率< 0.5

If you truly have 4 physical cores on your machine (and not 2 cores with hyperthreading or similar), you should see that the ratio becomes closer to what is expected for larger inputs, as chepner said. If you only have 2 physical cores, you can't get ratio < 0.5

这篇关于测试python多处理:由于开销而导致速度较低?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆