Python多处理池卡住了 [英] Python multiprocessing pool stuck

查看:714
本文介绍了Python多处理池卡住了的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试运行在Web上找到的python multiprocessing.pool模块的一些示例代码.代码是:

I'm trying to run some sample code of the multiprocessing.pool module of python, found in the web. The code is:

def square(x):
    return x * x
if __name__ == '__main__':
    pool = Pool(processes=4)
    inputs = [0, 1, 2, 3, 4]
    outputs = pool.map(square, inputs)

但是当我尝试运行它时,它永远不会终止执行,因此我必须重新启动IpythonNotebook笔记本的内核. 有什么问题吗?

But when i try to run it, it never finsh the execution and i have to restart the kernel of my IpythonNotebook notebook. What's the problem?

推荐答案

您可能会从看到的答案指出John的评论中,通常不应期望multiprocessing.Pool在交互式解释器中能很好地工作.要了解为什么会发生这种情况,请考虑Pool的工作方式:

As you may read from the answer pointed out by John in the comments, multiprocessing.Pool, in general, should not be expected to work well within an interactive interpreter. To understand why it is the case, consider how Pool does its job:

  • 它将派生python worker,并将当前Python文件的名称传递给他们.
  • 然后,工作人员基本上执行import <this file>,并听取来自主服务器的消息.
  • 主服务器通过酸洗将函数名称和函数参数发送给工作程序.请注意,功能本身无法发送,因为pickle协议不允许这样做.
  • It forks python workers, passing to them the name of the current Python file.
  • The workers then essentially do import <this file>, and listen for messages from the master.
  • The master sends function names along with function arguments to the workers via pickling. Note that functions themselves cannot be sent, because the pickle protocol does not allow that.

当您尝试从交互式提示符下执行此过程时,没有合理的当前Python文件"传递给子级进行导入.而且,您在交互式提示中定义的功能不是任何模块的一部分(它们是动态定义的),因此子级不能从不存在的模块中导入这些功能.因此,最简单的选择就是避免在IPython中使用multiprocessing. IPython并行还是要好得多:)

When you try to perform this procedure from an interactive prompt, there is no reasonable "current Python file" to pass to the children for importing. Moreover, the functions you defined in your interactive prompt are not part of any module (they are dynamically defined), and hence cannot be imported by the children from that nonexistent module. So your easiest bet is to simply avoid using multiprocessing within IPython. IPython parallel is so much better anyway :)

出于完整性考虑,我还检查了在Windows 8上以Python 2.7运行的IPython 4的特殊情况下发生的情况(在这里我也可以观察到解释器卡住了).有趣的是,IPython排在第一位的原因并非上述原因之一.

For completeness' sake I also checked what exactly happens in my particular case of an IPython 4 running under Python 2.7 on Windows 8 (where I can observe the interpreter getting stuck as well). Interestingly, the reason IPython gets stuck in the first place is not one of those mentioned above.

事实证明,多处理程序检查是否定义了__main__.__file__,如果未定义,则将sys.argv[0]作为当前文件名"发送给子级.对于(我的版本)IPython sys.argv[0]等于C:\Dev\Anaconda\lib\site-packages\ipykernel\__main__.py.

It turns out that multiprocessing checks whether __main__.__file__ is defined, and if not, sends sys.argv[0] as the "current filename" to the children. In the case of (my version of) IPython sys.argv[0] is equal to C:\Dev\Anaconda\lib\site-packages\ipykernel\__main__.py.

不幸的是,工作进程在启动之前碰巧要检查要导入的文件是否已在其sys.modules中. multiprocessing/forking.py的488行说:

Unfortunately, the worker processes before starting up happen to check whether the file they are going to import is already in their sys.modules. Line 488 of multiprocessing/forking.py says:

assert main_name not in sys.modules, main_name

main_name__main__时(与ipython的worker一样),该声明将失败,并且worker无法启动.但是,相同的代码足够聪明",可以检查所传递的名称是否为ipython,在这种情况下,它不会进行此类检查,也不会导入任何内容.

When the main_name is __main__ (as is the case with ipython's workers) this assertion fails and the workers fail to start. The same code, however, is "smart" enough to check whether the passed name is ipython, in which case it does no such checks nor does not import anything.

因此,可以使用定义__main__.__file__等于ipython的丑陋技巧解决工人无法启动的问题.以下代码在IPython单元中确实可以正常工作:

Consequently, the problem of workers failing to start could be solved using an ugly hack of defining __main__.__file__ to be equal to ipython. The following code does work fine from an IPython cell:

import sys
sys.modules['__main__'].__file__ = 'ipython'
from multiprocessing import Pool

pool = Pool(processes=4)
inputs = [0, 1, 2, 3, 4]
outputs = pool.map(abs, inputs)

请注意,此示例要求工作人员计算内置函数abs.如果您要求工作人员计算您在笔记本中定义的函数,则将失败(正常情况除外).

Note that this example asks the workers to compute abs, a built-in function. It would fail (gracefully, with an exception) if you asked the workers to compute a function you defined within the notebook.

事实证明,从原则上讲,您可以进一步进行黑客攻击,并使用一些手动酸洗的代码将您的功能发送给工人.您可以在此处.

It turns out you can, in principle, go further with the hacking and have your functions sent over to the workers using some manual pickling of their code. You can find a pretty cool example of such a hack here.

这篇关于Python多处理池卡住了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆