Python,WSGI,多处理和共享数据 [英] Python, WSGI, multiprocessing and shared data

查看:158
本文介绍了Python,WSGI,多处理和共享数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于mod_wsgi的多进程功能以及将在具有多处理能力的WSGI服务器上执行的WSGI应用程序的一般设计,我有些困惑.

请考虑以下指令:

WSGIDaemonProcess example processes=5 threads=1

如果我理解正确,mod_wsgi将产生5个Python(例如CPython)进程,并且这些进程中的任何一个都可以接收用户的请求.

文档说:

所有应用程序实例都需要在何处共享数据,无论它们在哪个子进程中执行,以及对它们所做的更改 一个应用程序的数据立即可用于另一应用程序, 包括在另一个子进程中执行的任何外部数据 必须使用诸如数据库或共享内存之类的存储.全球的 普通Python模块中的变量不能用于此目的.

但是在那种情况下,如果要确保某个应用程序可以在任何WSGI条件下运行(包括多处理条件),它将变得非常沉重.

例如,一个包含当前已连接用户数量的简单变量-应该是从进程缓存中安全地从内存缓存中读取或写入的,还是数据库,或者(如果这样的标准库机制是可用)共享内存?

代码会像这样

counter = 0

@app.route('/login')
def login():
    ...
    counter += 1
    ...

@app.route('/logout')
def logout():
    ...
    counter -= 1
    ...

@app.route('/show_users_count')
def show_users_count():
    return counter

在多处理环境中表现出不可预测的作用?

谢谢!

解决方案

在您的问题中有几个方面需要考虑.

首先,Apache MPM与mod_wsgi应用程序之间的交互.如果以嵌入式模式运行mod_wsgi应用程序(无需WSGIDaemonProcessWSGIProcessGroup %{GLOBAL}),则将从apache MPM继承多处理/多线程.这应该是最快的选项,最终您将拥有多个进程,每个进程具有多个线程,具体取决于您的MPM配置.相反,如果您使用WSGIDaemonProcess <name> [options]WSGIProcessGroup <name>在守护程序模式下运行mod_wsgi,则可以以较小的 multiprocessing.managers :在Apache外部创建并启动BaseManager服务器进程

m = multiprocessing.managers.BaseManager(address=('', 12345), authkey='secret')
m.get_server().serve_forever()

在您的应用程序内部,您connect:

m = multiprocessing.managers.BaseManager(address=('', 12345), authkey='secret')
m.connect()

上面的示例是虚拟的,因为m没有注册有用的方法,但是此处(python文档),您将找到如何在进程之间创建和代理对象(如示例中的counter).

使用processes=5 threads=1对您的示例的最终评论.我知道这只是一个例子,但是在现实世界中的应用程序中,我怀疑性能与processes=1 threads=5相当:只有在预期的性能比单进程"提高的情况下,才应该进入多处理共享数据的复杂性许多线程的模型很重要.

I am a bit confused about multiproessing feature of mod_wsgi and about a general design of WSGI applications that would be executed on WSGI servers with multiprocessing ability.

Consider the following directive:

WSGIDaemonProcess example processes=5 threads=1

If I understand correctly, mod_wsgi will spawn 5 Python (e.g. CPython) processes and any of these processes can receive a request from a user.

The documentation says that:

Where shared data needs to be visible to all application instances, regardless of which child process they execute in, and changes made to the data by one application are immediately available to another, including any executing in another child process, an external data store such as a database or shared memory must be used. Global variables in normal Python modules cannot be used for this purpose.

But in that case it gets really heavy when one wants to be sure that an app runs in any WSGI conditions (including multiprocessing ones).

For example, a simple variable which contains the current amount of connected users - should it be process-safe read/written from/to memcached, or a DB or (if such out-of-the-standard-library mechanisms are available) shared memory?

And will the code like

counter = 0

@app.route('/login')
def login():
    ...
    counter += 1
    ...

@app.route('/logout')
def logout():
    ...
    counter -= 1
    ...

@app.route('/show_users_count')
def show_users_count():
    return counter

behave unpredictably in multiprocessing environment?

Thank you!

解决方案

There are several aspects to consider in your question.

First, the interaction between apache MPM's and mod_wsgi applications. If you run the mod_wsgi application in embedded mode (no WSGIDaemonProcess needed, WSGIProcessGroup %{GLOBAL}) you inherit multiprocessing/multithreading from the apache MPM's. This should be the fastest option, and you end up having multiple processes and multiple threads per process, depending on your MPM configuration. On the contrary if you run mod_wsgi in daemon mode, with WSGIDaemonProcess <name> [options] and WSGIProcessGroup <name>, you have fine control on multiprocessing/multithreading at the cost of a small overhead.

Within a single apache2 server you may define zero, one, or more named WSGIDaemonProcesses, and each application can be run in one of these processes (WSGIProcessGroup <name>) or run in embedded mode with WSGIProcessGroup %{GLOBAL}.

You can check multiprocessing/multithreading by inspecting the wsgi.multithread and wsgi.multiprocess variables.

With your configuration WSGIDaemonProcess example processes=5 threads=1 you have 5 independent processes, each with a single thread of execution: no global data, no shared memory, since you are not in control of spawning subprocesses, but mod_wsgi is doing it for you. To share a global state you already listed some possible options: a DB to which your processes interface, some sort of file system based persistence, a daemon process (started outside apache) and socket based IPC.

As pointed out by Roland Smith, the latter could be implemented using a high level API by multiprocessing.managers: outside apache you create and start a BaseManager server process

m = multiprocessing.managers.BaseManager(address=('', 12345), authkey='secret')
m.get_server().serve_forever()

and inside you apps you connect:

m = multiprocessing.managers.BaseManager(address=('', 12345), authkey='secret')
m.connect()

The example above is dummy, since m has no useful method registered, but here (python docs) you will find how to create and proxy an object (like the counter in your example) among your processes.

A final comment on your example, with processes=5 threads=1. I understand that this is just an example, but in real world applications I suspect that performance will be comparable with respect to processes=1 threads=5: you should go into the intricacies of sharing data in multiprocessing only if the expected performance boost over the 'single process many threads' model is significant.

这篇关于Python,WSGI,多处理和共享数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆