测试python多处理:由于开销而导致速度较低? [英] testing python multiprocessing: low speed because of overhead?
问题描述
我正在尝试学习python(2.7)中的multiprocessing
.我的CPU有4个核心.在下面的代码中,我测试了同一基本指令的并行Vs串行执行速度.
I'm trying to learn about multiprocessing
in python (2.7). My CPU has 4 cores. In the following code I test speed of parallel Vs serial execution of the same basic instruction.
我发现使用4个内核所花费的时间仅为0.67,而仅使用一个内核所花费的时间却很幼稚,我希望大约为0.25.
I find that the time taken using the 4 cores is only 0.67 the one taken by only one core, while naively I'd expect ~0.25.
是开销的原因吗?它从何而来?这四个过程不是独立的吗?
Is overhead the reason? where does it come from? Are not the 4 processes independent?
我还尝试了pool.map
和pool.map_async
,在速度方面的结果非常相似.
I also tried pool.map
and pool.map_async
, with very similar results in terms of speed.
from multiprocessing import Process
import time
def my_process(a):
for i in range(0,a[1]):
j=0
while j<10000:
j = j+1
print(a,j)
if __name__ == '__main__':
# arguments to pass:
a = ((0,2000),(1,2000),(2,2000),(3,2000))
# --- 1) parallel processes:
# 4 cores go up to 100% each here
t0 = time.time()
proc1 = Process(target=my_process, args=(a[0],))
proc2 = Process(target=my_process, args=(a[1],))
proc3 = Process(target=my_process, args=(a[2],))
proc4 = Process(target=my_process, args=(a[3],))
proc1.start(); proc2.start(); proc3.start(); proc4.start()
proc1.join() ; proc2.join() ; proc3.join() ; proc4.join()
dt_parallel = time.time()-t0
print("parallel : " + str(dt_parallel))
# --- 2) serial process :
# 1 core only goes up to 100%
t0 = time.time()
for k in a:
my_process(k)
dt_serial = time.time()-t0
print("serial : " + str(dt_serial))
print("t_par / t_ser = " + str(dt_parallel/dt_serial))
编辑,我的PC实际上有2个物理核心(每个插槽2 = 2核心* 1个插槽,来自lscpu
[感谢@goncalopp]).如果仅使用前两个进程运行上述脚本,则比率为0.62,与使用3或4个进程获得的比率没有什么不同.我想要比这快起来并不容易.
EDIT my PC has actually 2 physical cores (2 = 2 cores per socket * 1 sockets, from lscpu
[thanks @goncalopp]). If I run the above script with only the first 2 processes I get a ratio of 0.62, not that different to the one obtained with 3 or 4 processes. I guess it won't be easy to go faster than that.
我在另一台具有lscpu
的PC上进行了测试:CPU:32,每个内核的线程:2,每个插槽的内核:8,插槽:2,我得到一个比率为0.34,类似于@dano.
I tested on another PC with lscpu
: CPU(s):32, Thread(s) per core: 2, core(s) per socket: 8, Socket(s): 2, and I get a ratio of 0.34, similar to @dano.
感谢您的帮助
推荐答案
是的,此可能与开销有关,包括:
Yes, this may be related to overhead, including:
- 创建和启动流程
- 将函数和参数传递给他们
- 等待进程终止
如果您的计算机上确实有4个物理核心(而不是2个具有超线程或类似功能的核心),您应该看到该比率接近于较大输入所期望的比率,如chepner所说.如果您只有2个物理核心,则无法获得比率< 0.5
If you truly have 4 physical cores on your machine (and not 2 cores with hyperthreading or similar), you should see that the ratio becomes closer to what is expected for larger inputs, as chepner said. If you only have 2 physical cores, you can't get ratio < 0.5
这篇关于测试python多处理:由于开销而导致速度较低?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!