并行化嵌套的Python for循环 [英] Parallelizing a nested Python for loop

查看:99
本文介绍了并行化嵌套的Python for循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

哪种类型的并行Python方法适合于有效地分散如下所示的CPU约束的工作负载.将部分并行化是否可行?看起来循环迭代之间并没有太多紧密的耦合,也就是说,循环的某些部分可以并行处理,只要最后进行适当的通信来重建store变量即可.我目前正在使用Python2.7,但是如果有充分的理由可以在较新的版本中轻松解决此问题,那么我将考虑迁移代码库.

What type of parallel Python approach would be suited to efficiently spreading the CPU bound workload shown below. Is it feasible to parallelize the section? It looks like there is not much tight coupling between the loop iterations i.e. portions of the loop could be handled in parallel so long as an appropriate communication to reconstruct the store variable is done at the end. I'm currently using Python2.7, but if a strong case could be made that this problem can be easily handled in a newer version, then I will consider migrating the code base.

我尝试通过以下示例捕捉计算的精髓.我相信它在循环/变量之间的连通性与我的实际代码相同.

I have tried to capture the spirit of the computation with the example below. I believe that it has the same connectedness between the loops/variables as my actual code.

nx = 20
ny = 30
myList1 = [0]*100
myList2 = [1]*25
value1 = np.zeros(nx)
value2 = np.zeros(ny)
store = np.zeros(nx,ny,len(myList1),len(myList2))
for i in range(nx):
  for j in range(ny):
    f = calc(value1[i],value2[j])  #returns a list
    for k,data1 in enumerate(myList1):
      for p,data2 in enumerate(myList2):
        meanval = np.sum(f[:]/data1)*data2
        store[i,j,k,p] = meanval   

推荐答案

以下是您可以采用的两种方法.明智的选择还取决于瓶颈在哪里,这是可以最好地衡量而不是猜测的东西.

Here are two approaches you can take. What's wise also depends on where the bottleneck is, which is something that can best be measured rather than guessed.

理想的选择是将所有低级优化留给Numpy.现在,您将混合使用本地Python代码和Numpy代码.后者不能很好地与循环配合使用.它们当然可以工作,但是通过在Python中进行循环,您可以强制操作按指定的顺序顺序进行.最好提供可以同时对尽可能多的元素执行的Numpy操作,即矩阵转换.这不仅使自动(部分)并行化,而且使性能受益.即使是单线程也可以从CPU中获得更多收益.强烈建议您阅读以进一步了解此内容的内容是从Python到Numpy .

The ideal option would be to leave all low level optimization to Numpy. Right now you have a mix of native Python code and Numpy code. The latter doesn't play well with loops. They work, of course, but by having loops in Python, you force operations to take place sequentially in the order you specified. It's better to give Numpy operations that it can perform on as many elements at once as possible, i.e. matrix transformations. That benefits performance, not only because of automatic (partial) parallelization; even single threads will be able to get more out of the CPU. A highly recommended read to learn more about this is From Python to Numpy.

如果确实需要并行化纯Python代码,则除了进行多个处理外,您别无选择.为此,请参考 multiprocessing 模块.将代码重新排列为三个步骤:

If you do need to parallelize pure Python code, you have few options but to go with multiple processes. For that, refer to the multiprocessing module. Rearrange the code into three steps:

  • 为每项工作准备输入
  • 将这些作业划分为一组要并行运行的工人(货叉/地图)
  • 收集结果(加入/减少)

您需要在足够多的进程之间取得平衡,以使并行化值得,但不要太多,以至于它们不会太短命.这样,独立进程的成本和与之进行通信的成本就会变得很昂贵.

You need to strike a balance between enough processes to make parallelizing worthwhile, and not so many that they will be too short-lived. The cost of spinning up processes and communicating with them would then become significant by itself.

一个简单的解决方案是生成一个(i,j)对的列表,以便存在nx*ny个作业.然后创建一个函数,以此类对作为输入并返回(i,j,k,p,meanval)的列表.尝试仅使用函数的输入并返回结果.一切都在当地;无副作用等.可以对诸如myList1之类的全局变量进行只读访问,但是修改需要采取文档中所述的特殊措施.将函数和输入列表传递给工作池.完成部分结果后,将所有结果合并到您的store.

A simple solution would be to generate a list of (i,j) pairs, so that there will nx*ny jobs. Then make a function that takes such pair as input and returns a list of (i,j,k,p,meanval). Try to only use the inputs to the function and return a result. Everything local; no side effects et cetera. Read-only access to globals such as myList1 is okay, but modification requires special measures as described in the documentation. Pass the function and the list of inputs to a worker pool. Once it has finished producing partial results, combine all those into your store.

这是一个示例脚本:

from multiprocessing import Pool
import numpy as np

# Global variables are OK, as long as their contents are not modified, although
# these might just as well be moved into the worker function or an initializer
nx = 20
ny = 30
myList1 = [0]*100
myList2 = [1]*25
value1 = np.zeros(nx)
value2 = np.zeros(ny)

def calc_meanvals_for(pair):
    """Process a reasonably sized chunk of the problem"""
    i, j = pair
    f = calc(value1[i], value2[j])
    results = []
    for k, data1 in enumerate(myList1):
        for p, data2 in enumerate(myList2):
            meanval = np.sum(f[:]/data1)*data2
            results.append((i,j,k,p,meanval))
    return results

# This module will be imported by every worker - that's how they will be able
# to find the global variables and the calc function - so make sure to check
# if this the main program, because without that, every worker will start more
# workers, each of which will start even more, and so on, in an endless loop
if __name__ == '__main__':
    # Create a pool of worker processes, each able to use a CPU core
    pool = Pool()
    # Prepare the arguments, one per function invocation (tuples to fake multiple)
    arg_pairs = [(i,j) for i in range(nx) for j in range(ny)]
    # Now comes the parallel step: given a function and a list of arguments,
    # have a worker invoke that function with one argument until all arguments
    # have been used, collecting the return values in a list
    return_values = pool.map(calc_meanvals_for, arg_pairs)
    # Since the function also returns a list, there's now a list of lists - consider
    # itertools.chain.from_iterable to flatten them - to be processed further
    store = np.zeros(nx, ny, len(myList1), len(myList2))
    for results in return_values:
        for i, j, k, p, meanval in results:
            store[i,j,k,p] = meanval

这篇关于并行化嵌套的Python for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆