Python多重处理:池中是否可能有池? [英] Python multiprocessing: is it possible to have a pool inside of a pool?

查看:86
本文介绍了Python多重处理:池中是否可能有池?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个模块A,它通过获取数据并将其发送到模块B,C,D等进行分析,然后将其结果结合在一起,从而进行基本的映射/归约.

I have a module A that does a basic map/reduce by taking data and sending it to modules B, C, D, etc for analysis and then joining their results together.

但是看来模块B,C,D等本身无法创建多处理池,否则我会得到

But it appears that modules B, C, D, etc cannot themselves create a multiprocessing pool, or else I get

AssertionError: daemonic processes are not allowed to have children

是否可以通过其他方式并行化这些作业?

Is it possible to parallelize these jobs some other way?

为清楚起见,这是一个(公认的不好的)婴儿示例. (我通常会尝试/抓住,但您会领会到这一点.)

For clarity, here's a(n admittedly bad) baby example. (I would normally try/catch but you get the gist.)

A.py:

  import B
  from multiprocessing import Pool

  def main():
    p = Pool()
    results = p.map(B.foo,range(10))
    p.close()
    p.join()
    return results


B.py:

  from multiprocessing import Pool

  def foo(x):
    p = Pool()
    results = p.map(str,x)
    p.close()
    p.join()
    return results

推荐答案

有可能在游泳池内有一个游泳池吗?

is it possible to have a pool inside of a pool?

是的,但除非您想提出僵尸大军.来自 Python进程池非守护进程?:

Yes, it is possible though it might not be a good idea unless you want to raise an army of zombies. From Python Process Pool non-daemonic?:

import multiprocessing.pool
from contextlib import closing
from functools import partial

class NoDaemonProcess(multiprocessing.Process):
    # make 'daemon' attribute always return False
    def _get_daemon(self):
        return False
    def _set_daemon(self, value):
        pass
    daemon = property(_get_daemon, _set_daemon)

# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class Pool(multiprocessing.pool.Pool):
    Process = NoDaemonProcess

def foo(x, depth=0):
    if depth == 0:
        return x
    else:
        with closing(Pool()) as p:
            return p.map(partial(foo, depth=depth-1), range(x + 1))

if __name__ == "__main__":
    from pprint import pprint
    pprint(foo(10, depth=2))

输出

[[0],
 [0, 1],
 [0, 1, 2],
 [0, 1, 2, 3],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4, 5],
 [0, 1, 2, 3, 4, 5, 6],
 [0, 1, 2, 3, 4, 5, 6, 7],
 [0, 1, 2, 3, 4, 5, 6, 7, 8],
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]

concurrent.futures 默认支持:

concurrent.futures supports it by default:

# $ pip install futures # on Python 2
from concurrent.futures import ProcessPoolExecutor as Pool
from functools import partial

def foo(x, depth=0):
    if depth == 0:
        return x
    else:
        with Pool() as p:
            return list(p.map(partial(foo, depth=depth-1), range(x + 1)))

if __name__ == "__main__":
    from pprint import pprint
    pprint(foo(10, depth=2))

它产生相同的输出.

是否可以通过其他方式并行化这些作业?

Is it possible to parallelize these jobs some other way?

是的.例如,查看 celery如何允许创建复杂的工作流程.

Yes. For example, look at how celery allows to create a complex workflow.

这篇关于Python多重处理:池中是否可能有池?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆