python多重处理的替代使用模式避免了全局状态的扩散? [英] Alternative use patterns for python multiprocessing avoiding proliferation of global state?

查看:71
本文介绍了python多重处理的替代使用模式避免了全局状态的扩散?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个(大大简化的示例)可以很好地工作(Python 2.6.6,Debian Squeeze):

This (enormously simplified example) works fine (Python 2.6.6, Debian Squeeze):

from multiprocessing import Pool
import numpy as np

src=None

def process(row):
    return np.sum(src[row])

def main():
    global src
    src=np.ones((100,100))

    pool=Pool(processes=16)
    rows=pool.map(process,range(100))
    print rows

if __name__ == "__main__":
    main()

但是,在经历了多年的 全球状态不佳!!! 之后,我所有的直觉告诉我,我真的很想写点更接近的东西:

however, after years of being taught global state bad!!!, all my instincts are telling me I really really would rather be writing something closer to:

from multiprocessing import Pool
import numpy as np

def main():
    src=np.ones((100,100))

    def process(row):
        return np.sum(src[row])

    pool=Pool(processes=16)
    rows=pool.map(process,range(100))
    print rows

if __name__ == "__main__":
    main()

但是当然不起作用(挂断无法腌制某些东西).

but of course that doesn't work (hangs up unable to pickle something).

这里的例子很简单,但是当您添加多个过程"函数时,每个函数都依赖于多个附加输入...好吧,这一切都让人想起30年前用BASIC编写的东西.尝试使用类至少将状态与适当的函数进行聚合似乎是一个显而易见的解决方案,但是似乎并不那么简单 a>在实践中.

The example here is trivial, but by the time you add multiple "process" functions, and each of those is dependent on multiple additional inputs... well it all becomes a bit reminiscent of something written in BASIC 30 years ago. Trying to use classes to at least aggregate the state with the appropriate functions seems an obvious solution, but doesn't seem to be that easy in practice.

使用multiprocessing.pool是否有一些推荐的模式或样式,可以避免全局状态的扩散来支持我要并行映射的每个函数?

Is there some recommended pattern or style for using multiprocessing.pool which will avoid the proliferation of global state to support each function I want to parallel map over ?

经验丰富的多处理专家"如何处理呢?

How do experienced "multiprocessing pros" deal with this ?

更新:请注意,我实际上对处理更大的数组感兴趣,因此,每次调用/迭代中的pickle src的上述变化都远不及派生它的那些变化.进入池的工作进程.

Update: Note that I'm actually interested in processing much bigger arrays, so variations on the above which pickle src each call/iteration aren't nearly as good as ones which fork it into the pool's worker processes.

推荐答案

您总是可以像这样传递一个可调用对象,然后该对象可以包含共享状态:

You could always pass a callable object like this, then the object can containe the shared state:

from multiprocessing import Pool
import numpy as np

class RowProcessor(object):
    def __init__(self, src):
        self.__src = src
    def __call__(self, row):
        return np.sum(self.__src[row])

def main():
    src=np.ones((100,100))
    p = RowProcessor(src)

    pool=Pool(processes=16)
    rows = pool.map(p, range(100))
    print rows

if __name__ == "__main__":
    main()

这篇关于python多重处理的替代使用模式避免了全局状态的扩散?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆