如何并行化复杂的for循环 [英] How to parallelize complicated for loops
问题描述
我有一个复杂的for
循环,其中包含对循环中多个记录的多项操作.循环看起来像这样:
I have a complicated for
loop which contains multiple operations for multiple records in a loop. The loop looks like this:
for i,j,k in zip(is,js,ks):
#declare multiple lists.. like
a = []
b = []
#...
if i:
for items in i:
values = items['key'].split("--")
#append the values to the declared lists
a.append(values[0])
b.append(values[1])
# also other operations with j and k where are is a list of dicts.
if "substring" in k:
for k, v in j["key"].items():
l = "string"
t = v
else:
for k, v in j["key2"].items():
l = k
t = v
# construct an object with all the lists/params
content = {
'sub_content': {
"a":a,
"b":b,
.
.
}
}
#form a tuple. We are interested in this tuple.
data_tuple = (content,t,l)
考虑上面的for
循环,如何并行化它?我已经研究了多处理,但无法并行化如此复杂的循环.我也欢迎可能在这里表现更好的建议,包括诸如OpenMP/MPI/OpenACC之类的并行语言范例.
Considering the above for
loop, how do I parallelize it? I've looked into multiprocessing but I have not been able to parallelize such a complex loop. I am also open to suggestions that might perform better here including parallel language paradigms like OpenMP/MPI/OpenACC.
推荐答案
您可以使用Python 这个出色的答案中所述,您应该弄清楚是否需要多处理或多线程.
You can use the Python multiprocessing library. As noted in this excellent answer you should figure out if you need multi-processing or multi-threading.
底线:如果您需要多线程,则应使用multiprocessing.dummy.如果仅执行没有IO/依赖关系的CPU密集型任务,则可以使用多处理.
Bottom Line: If you need multi-threading you should use multiprocessing.dummy. If you are only doing CPU intensive tasks with no IO/dependencies then you can use multiprocessing.
multiprocessing.dummy与多处理模块完全相同, 但改用线程(一个重要的区别-使用多个 CPU密集型任务的流程; (以及在IO期间)的线程数:
multiprocessing.dummy is exactly the same as multiprocessing module, but uses threads instead (an important distinction - use multiple processes for CPU-intensive tasks; threads for (and during) IO):
设置zip对象
#!/usr/bin/env python3
import numpy as np
n = 2000
xs = np.arange(n)
ys = np.arange(n) * 2
zs = np.arange(n) * 3
zip_obj = zip(xs, ys, zs)
简单的示例函数
def my_function(my_tuple):
iv, jv, kv = my_tuple
return f"{str(iv)}-{str(jv)}-{str(kv)}"
设置多线程.
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
data_tuple = pool.map(my_function, zip_obj)
您的完整示例
def my_function(my_tuple):
i, j, k = my_tuple
#declare multiple lists.. like
a = []
b = []
#...
if (i):
for items in i:
values = items['key'].split("--")
#append the values to the declared lists
a.append(values[0])
b.append(values[1])
#also other ooperations with j and k where are is a list of dicts.
if ("substring" in k):
for k, v in j["key"].items():
l = "string"
t = v
else:
for k, v in j["key2"].items():
l = k
t = v
#construct an object called content with all the lists/params like
content = {
'sub_content': {
"a":a,
"b":b,
.
.
}
}
#form a tuple. We are interested in this tuple.
return (content,t,l)
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
zip_obj = zip(is,js,ks)
data_tuple = pool.map(my_function, zip_obj)
# Do whatever you need to do w/ data_tuple here
这篇关于如何并行化复杂的for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!