Python多处理:多个进程不会提高性能 [英] Python multiprocessing: no performance gain with multiple processes
问题描述
使用多处理,我试图并行化一个函数,但性能没有改善:
Using multiprocessing, I tried to parallelize a function but I have no performance improvement:
from MMTK import *
from MMTK.Trajectory import Trajectory, TrajectoryOutput, SnapshotGenerator
from MMTK.Proteins import Protein, PeptideChain
import numpy as np
filename = 'traj_prot_nojump.nc'
trajectory = Trajectory(None, filename)
def calpha_2dmap_mult(trajectory = trajectory, t = range(0,len(trajectory))):
dist = []
universe = trajectory.universe
proteins = universe.objectList(Protein)
chain = proteins[0][0]
traj = trajectory[t]
dt = 1000 # calculate distance every 1000 steps
for n, step in enumerate(traj):
if n % dt == 0:
universe.setConfiguration(step['configuration'])
for i in np.arange(len(chain)-1):
for j in np.arange(len(chain)-1):
dist.append(universe.distance(chain[i].peptide.C_alpha,
chain[j].peptide.C_alpha))
return(dist)
c0 = time.time()
dist1 = calpha_2dmap_mult(trajectory, range(0,11001))
c1 = time.time() - c0
print(c1)
# Multiprocessing
from multiprocessing import Pool, cpu_count
pool = Pool(processes=4)
c0 = time.time()
dist_pool = [pool.apply(calpha_2dmap_mult, args=(trajectory, t,)) for t in
[range(0,2001), range(3000,5001), range(6000,8001),
range(9000,11001)]]
c1 = time.time() - c0
print(c1)
在不使用(70.1s)或进行多处理(70.2s)的情况下,用于计算距离的时间是相同"的!我可能没想到会提高4倍,但我至少期望得到一些改善! 有人知道我做错了吗?
The time spent to calculate the distances is the 'same' without (70.1s) or with multiprocessing (70.2s)! I was maybe not expecting an improvement of a factor 4 but I was at least expecting some improvements! Is someone knows what I did wrong?
推荐答案
Pool.apply 是阻止操作:
[
Pool.apply
是] apply()内置函数的等效项. 它将阻塞,直到结果准备就绪为止,因此apply_async()
更适合并行执行工作..
[
Pool.apply
is the] equivalent of the apply() built-in function. It blocks until the result is ready, soapply_async()
is better suited for performing work in parallel ..
在这种情况下,Pool.map
可能更适合于收集结果;地图本身会阻止,但序列元素/转换会并行处理.
In this case Pool.map
is likely more appropriate for collecting the results; the map itself blocks but the sequence elements / transformations are processed in parallel.
除了使用部分应用程序(或手动实现此类)外,还考虑扩展数据本身.是同一只猫,只是皮肤不同.
It addition to using partial application (or manual realization of such), also consider expanding the data itself. It's the same cat in a different skin.
data = ((trajectory, r) for r in [range(0,2001), ..])
result = pool.map(.., data)
这又可以扩展:
def apply_data(d):
return calpha_2dmap_mult(*d)
result = pool.map(apply_data, data)
需要编写函数(或此类的简单参数扩展代理)以接受单个参数,但现在所有数据都映射为单个单元.
The function (or simple argument-expanded proxy of such of such) will need to be written to accept a single argument but all the data is now mapped as a single unit.
这篇关于Python多处理:多个进程不会提高性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!