在终端和 Django 或 Flask 的代码模块中使用 python 多处理池 [英] Using python multiprocessing Pool in the terminal and in code modules for Django or Flask

查看:26
本文介绍了在终端和 Django 或 Flask 的代码模块中使用 python 多处理池的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 python 中使用 multiprocessing.Pool 和下面的代码时,有一些奇怪的行为.

When using multiprocessing.Pool in python with the following code, there is some bizarre behavior.

from multiprocessing import Pool
p = Pool(3)
def f(x): return x
threads = [p.apply_async(f, [i]) for i in range(20)]
for t in threads:
    try: print(t.get(timeout=1))
    except Exception: pass

我收到了 3 次以下错误(池中的每个线程一个),它打印3"到19":

I get the following error three times (one for each thread in the pool), and it prints "3" through "19":

AttributeError: 'module' object has no attribute 'f'

前三个 apply_async 调用永远不会返回.

The first three apply_async calls never return.

与此同时,如果我尝试:

Meanwhile, if I try:

from multiprocessing import Pool
p = Pool(3)
def f(x): print(x)
p.map(f, range(20))

我得到 AttributeError 3 次,shell 打印6"到19",然后挂起并且无法被 [Ctrl] + [C] 杀死

I get the AttributeError 3 times, the shell prints "6" through "19", and then hangs and cannot be killed by [Ctrl] + [C]

多处理文档有以下内容:

The multiprocessing docs have the following to say:

此包中的功能要求 ma​​in 模块是孩子们可以导入.

Functionality within this package requires that the main module be importable by the children.

这是什么意思?

澄清一下,我正在终端中运行代码来测试功能,但最终我希望能够将其放入网络服务器的模块中.如何在python终端和代码模块中正确使用multiprocessing.Pool?

To clarify, I'm running code in the terminal to test functionality, but ultimately I want to be able to put this into modules of a web server. How do you properly use multiprocessing.Pool in the python terminal and in code modules?

推荐答案

这意味着池必须在定义要在池上运行的函数之后初始化.在 if __name__ == "__main__": 块中使用池在您编写独立脚本时有效,但这在较大的代码库或服务器代码(例如 Django 或 Flask)中是不可能的项目).因此,如果您尝试在其中之一中使用池,请确保遵循以下指南中的这些指南:

What this means is that pools must be initialized after the definitions of functions to be run on them. Using pools within if __name__ == "__main__": blocks works if you are writing a standalone script, but this isn't possible in either larger code bases or server code (such as a Django or Flask project). So, if you're trying to use Pools in one of these, make sure to follow these guidelines, which are explained in the sections below:

  1. 尽可能在函数内部初始化池.如果您必须在全局范围内初始化它们,请在模块底部进行.
  2. 不要在全局范围内调用池的方法.

或者,如果您只需要更好的 I/O 并行性(如数据库访问或网络调用),您可以省去所有这些麻烦并使用线程池而不是进程池.这涉及完全没有记录的:

Alternatively, if you only need better parallelism on I/O (like database accesses or network calls), you can save yourself all this headache and use pools of threads instead of pools of processes. This involves the completely undocumented:

from multiprocessing.pool import ThreadPool

它的接口与 Pool 的接口完全相同,但由于它使用线程而不是进程,因此它没有使用进程池所做的任何警告,唯一的缺点是您无法获得真正的代码并行性执行,只是阻塞 I/O 的并行性.

It's interface is exactly the same as that of Pool, but since it uses threads and not processes, it comes with none of the caveats that using process pools do, with the only downside being you don't get true parallelism of code execution, just parallelism in blocking I/O.

python 文档中难以理解的文本意味着在定义池时,池中的线程会导入周围的模块.对于 python 终端,这意味着您迄今为止运行的所有代码.

The inscrutable text from the python docs means that at the time the pool is defined, the surrounding module is imported by the threads in the pool. In the case of the python terminal, this means all and only code you have run so far.

因此,您想在池中使用的任何函数都必须在池初始化之前定义.模块中的代码和终端中的代码都是如此.对问题中的代码进行以下修改可以正常工作:

So, any functions you want to use in the pool must be defined before the pool is initialized. This is true both of code in a module and code in the terminal. The following modifications of the code in the question will work fine:

from multiprocessing import Pool
def f(x): return x  # FIRST
p = Pool(3) # SECOND
threads = [p.apply_async(f, [i]) for i in range(20)]
for t in threads:
    try: print(t.get(timeout=1))
    except Exception: pass

from multiprocessing import Pool
def f(x): print(x)  # FIRST
p = Pool(3) # SECOND
p.map(f, range(20))

好的,我的意思是在 Unix 上很好.Windows 有它自己的问题,我不会在这里讨论.

By fine, I mean fine on Unix. Windows has it's own problems, that I'm not going into here.

等等,还有更多(在你想导入到别处的模块中使用池)!

But wait, there's more (to using pools in modules that you want to import elsewhere)!

如果在函数内部定义池,则没有问题.但如果您在模块中使用 Pool 对象作为全局变量,则必须在页面的底部而不是顶部定义它.尽管这与大多数好的代码风格背道而驰,但它是功能所必需的.使用在页面顶部声明的池的方法是仅将其与从其他模块导入的函数一起使用,如下所示:

If you define a pool inside a function, you have no problems. But if you are using a Pool object as a global variable in a module, it must be defined at the bottom of the page, not the top. Though this goes against most good code style, it is necessary for functionality. The way to use a pool declared at the top of a page is to only use it with functions imported from other modules, like so:

from multiprocessing import Pool
from other_module import f
p = Pool(3)
p.map(f, range(20))

从另一个模块导入预先配置的池是非常可怕的,因为导入必须在你想在它上面运行的任何东西之后进行,就像这样:

Importing a pre-configured pool from another module is pretty horrific, as the import must come after whatever you want to run on it, like so:

### module.py ###
from multiprocessing import Pool
POOL = Pool(5)

### module2.py ###
def f(x):
    # Some function
from module import POOL
POOL.map(f, range(10))

第二,如果您在导入的模块的全局范围内在池上运行任何内容,系统就会挂起.即这不起作用:

### module.py ###
from multiprocessing import Pool
def f(x): return x
p = Pool(1)
print(p.map(f, range(5)))

### module2.py ###
import module

然而,这确实有效,只要没有导入 module2:

This, however, does work, as long as nothing imports module2:

### module.py ###
from multiprocessing import Pool

def f(x): return x
p = Pool(1)
def run_pool(): print(p.map(f, range(5)))

### module2.py ###
import module
module.run_pool()

现在,这背后的原因只是更奇怪,可能与问题中的代码只吐出一次属性错误的原因有关,之后似乎可以正确执行代码.似乎池线程(至少具有一定的可靠性)在执行后重新加载模块中的代码.

Now, the reasons behind this are only more bizarre, and likely related to the reason that the code in the question only spits an Attribute Error once each and after that appear to execute code properly. It also appears that pool threads (at least with some reliability) reload the code in module after executing.

这篇关于在终端和 Django 或 Flask 的代码模块中使用 python 多处理池的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆