Python如何并行化循环 [英] Python how to parallelize loops
问题描述
我对多线程和多处理非常陌生,并尝试使for循环并行.我搜索了类似的问题,并基于多处理模块创建了代码. /p>
I am very new to multi-threading and multi-processing and trying to make for loop parallel. I searched similar questions, and created code based on multiprocessing module.
import timeit, multiprocessing
start_time = timeit.default_timer()
d1 = dict( (i,tuple([i*0.1,i*0.2,i*0.3])) for i in range(500000) )
d2={}
def fun1(gn):
for i in gn:
x,y,z = d1[i]
d2.update({i:((x+y+z)/3)})
if __name__ == '__main__':
gen1 = [x for x in d1.keys()]
fun1(gen1)
#p= multiprocessing.Pool(3)
#p.map(fun1,gen1)
print('Script finished')
stop_time = timeit.default_timer()
print(stop_time - start_time)
# 输出:
Script finished
0.8113944193950299
如果我更改代码,例如:
If I change code like:
#fun1(gen1)
p= multiprocessing.Pool(5)
p.map(fun1,gen1)
我收到错误消息:
for i in gn:
TypeError: 'int' object is not iterable
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
raise self._value
有什么想法可以做到这一点吗? MATLAB有一个parfor
选项来进行并行循环.我正在尝试使用这种方法使循环并行,但是它不起作用.有什么想法可以使循环并行吗?另外,如果函数返回值怎么办?如果fun1()
返回3个值,我可以写类似a,b,c=p.map(fun1,gen1)
的东西吗?
Any ideas how to make this parallel? MATLAB has a parfor
option to make parallel loops. I am trying to make loop parallel using this approach, but it is not working. Any ideas how can I make loops parallel? Also, what if the function returns a value - can I write something like a,b,c=p.map(fun1,gen1)
if fun1()
returns 3 values?
(在Windows python 3.6上运行)
(Running on Windows python 3.6)
推荐答案
如@Alex Hall所述,请从fun1
中删除迭代.另外,请等到所有池中的工人都完成为止.
As @Alex Hall mentioned, remove iteration from fun1
. Also, wait till all pool's workers are finished.
PEP8注意:import timeit, multiprocessing
是不好的做法,请将其分成两行.
PEP8 note: import timeit, multiprocessing
is bad practice, split it to two lines.
import multiprocessing
import timeit
start_time = timeit.default_timer()
d1 = dict( (i,tuple([i*0.1,i*0.2,i*0.3])) for i in range(500000) )
d2 = {}
def fun1(gn):
x,y,z = d1[gn]
d2.update({gn: ((x+y+z)/3)})
if __name__ == '__main__':
gen1 = [x for x in d1.keys()]
# serial processing
for gn in gen1:
fun1(gn)
# paralel processing
p = multiprocessing.Pool(3)
p.map(fun1, gen1)
p.close()
p.join()
print('Script finished')
stop_time = timeit.default_timer()
print(stop_time - start_time)
这篇关于Python如何并行化循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!