多核/多处理器上的TwistedWeb [英] TwistedWeb on multicore/multiprocessor

查看:132
本文介绍了多核/多处理器上的TwistedWeb的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在运行TwistedWeb服务器时,人们使用哪些技术来利用多个处理器/内核?有推荐的方法吗?

What techniques are people using to utilize multiple processors/cores when running a TwistedWeb server? Is there a recommended way of doing it?

我基于twisted.web的Web服务在Amazon EC2实例上运行,该实例通常具有多个CPU核心(8、16),并且该服务正在执行的工作类型得益于额外的处理能力,所以我会非常喜欢使用它.

My twisted.web based web service is running on Amazon EC2 instances, which often have multiple CPU cores (8, 16), and the type of work that the service is doing benefits from extra processing power, so i would very much like to use that.

我了解可以在多个Twisted实例之前使用haproxy,squid或配置为反向代理的Web服务器.实际上,我们目前正在使用这样的设置,其中nginx充当在同一主机上运行但在不同端口上运行的多个上游twisted.web服务的反向代理.

I understand that it is possible to use haproxy, squid or a web server, configured as a reverse proxy, in front of multiple instances of Twisted. In fact, we are currently using such a setup, with nginx serving as a reverse proxy to several upstream twisted.web services running on the same host, but each on different port.

这很好,但是我真正感兴趣的是一种没有前端"服务器的解决方案,但是所有扭曲的进程都以某种方式绑定到同一套接字并接受请求.这样的事情甚至有可能...还是我疯了?操作系统是Linux(CentOS).

This works fine, but what i'm really interested in, is a solution where there is no "front-facing" server, but all twistd processes somehow bind to the same socket and accept requests. Is such thing even possible... or am i being crazy? The operating system is Linux (CentOS).

谢谢.

安东.

推荐答案

有多种方法可以支持Twisted应用程序的多进程操作.但是,一开始要回答的一个重要问题是您期望的并发模型是什么,以及您的应用程序如何处理共享状态.

There are a number of ways to support multiprocess operation for a Twisted application. One important question to answer at the start, though, is what you expect your concurrency model to be, and how your application deals with shared state.

在单个进程Twisted应用程序中,并发是所有协作的(在Twisted的异步I/O API的帮助下),并且共享状态可以保存在Python对象可以使用的任何地方.您的应用程序代码会运行,但要知道,直到放弃控制,其他任何东西都不会运行.此外,您的应用程序中想要访问某些共享状态的任何部分都可以很容易地做到这一点,因为该状态可能保存在易于访问的无聊的旧Python对象中.

In a single process Twisted application, concurrency is all cooperative (with help from Twisted's asynchronous I/O APIs) and shared state can be kept anywhere a Python object would go. Your application code runs knowing that, until it gives up control, nothing else will run. Additionally, any part of your application that wants to access some piece of shared state can probably do so quite easily, since that state is probably kept in a boring old Python object that is easy to access.

当您有多个进程时,即使它们都在运行基于Twisted的应用程序,也有两种并发形式.一个与前面的情况相同-在特定过程中,并发是协作的.但是,您有一种新的类型,其中正在运行多个进程.平台的进程调度程序可能会随时在这些进程之间切换执行,并且您对此几乎没有控制权(并且几乎看不到它何时发生).它甚至可能安排您的两个进程同时在不同的内核上运行(这甚至可能是您所希望的).这意味着您无法保证一致性,因为一个进程不知道第二个进程何时会出现并尝试在某种共享状态下运行.这导致了另一个需要考虑的重要领域,即如何在流程之间实际共享状态.

When you have multiple processes, even if they're all running Twisted-based applications, then you have two forms of concurrency. One is the same as for the previous case - within a particular process, the concurrency is cooperative. However, you have a new kind, where multiple processes are running. Your platform's process scheduler might switch execution between these processes at any time, and you have very little control over this (as well as very little visibility into when it happens). It might even schedule two of your processes to run simultaneously on different cores (this is probably even what you're hoping for). This means that you lose some guarantees about consistency, since one process doesn't know when a second process might come along and try to operate on some shared state. This leads in to the other important area of consideration, how you will actually share state between the processes.

与单一流程模型不同,您不再拥有任何方便且易于访问的位置来存储所有代码可以到达的状态.如果将其放在一个进程中,则该进程中的所有代码都可以像普通的Python对象一样轻松地访问它,但是在您的任何其他进程中运行的任何代码都不再具有对它的轻松访问.您可能需要找到一个RPC系统,以使您的进程相互通信.或者,您可以设计流程划分,以便每个流程仅接收需要存储在该流程中的状态的请求.这样的示例可能是带有会话的网站,其中有关用户的所有状态都存储在他们的会话中,并且他们的会话由cookie标识.前端进程可以接收Web请求,检查cookie,查找哪个后端进程负责该会话,然后将请求转发到该后端进程.这种方案意味着后端通常不需要通信(只要您的Web应用程序足够简单,即只要用户彼此之间不交互或对共享数据进行操作即可).

Unlike the single process model, you no longer have any convenient, easily accessed places to store your state where all your code can reach it. If you put it in one process, all the code in that process can access it easily as a normal Python object, but any code running in any of your other processes no longer has easy access to it. You might need to find an RPC system to let your processes communicate with each other. Or, you might architect your process divide so that each process only receives requests which require state stored in that process. An example of this might be a web site with sessions, where all state about a user is stored in their session, and their sessions are identified by cookies. A front-end process could receive web requests, inspect the cookie, look up which back-end process is responsible for that session, and then forward the request on to that back-end process. This scheme means that back-ends typically don't need to communicate (as long as your web application is sufficiently simple - ie, as long as users don't interact with each other, or operate on shared data).

请注意,在该示例中,预分叉模型不合适.前端进程必须专门拥有侦听端口,以便它可以在后端进程处理所有传入请求之前检查所有传入请求.

Note that in that example, a pre-forking model is not appropriate. The front-end process must exclusively own the listening port so that it can inspect all incoming requests before they are handled by a back-end process.

当然,有许多类型的应用程序,还有许多其他用于管理状态的模型.选择正确的模型进行多处理需要首先了解哪种并发对您的应用程序有意义,以及如何管理应用程序的状态.

Of course, there are many types of application, with many other models for managing state. Selecting the right model for multi-processing requires first understanding what kind of concurrency makes sense for your application, and how you can manage your application's state.

话虽这么说,但有了非常新的Twisted版本(目前尚未发布),在多个进程之间共享侦听TCP端口非常容易.这是一个代码段,演示了您可以使用一些新API来完成此操作的一种方式:

That being said, with very new versions of Twisted (unreleased as of this point), it's quite easy to share a listening TCP port amongst multiple processes. Here is a code snippet which demonstrates one way you might use some new APIs to accomplish this:

from os import environ
from sys import argv, executable
from socket import AF_INET

from twisted.internet import reactor
from twisted.web.server import Site
from twisted.web.static import File

def main(fd=None):
    root = File("/var/www")
    factory = Site(root)

    if fd is None:
        # Create a new listening port and several other processes to help out.                                                                     
        port = reactor.listenTCP(8080, factory)
        for i in range(3):
            reactor.spawnProcess(
                    None, executable, [executable, __file__, str(port.fileno())],
                childFDs={0: 0, 1: 1, 2: 2, port.fileno(): port.fileno()},
                env=environ)
    else:
        # Another process created the port, just start listening on it.                                                                            
        port = reactor.adoptStreamPort(fd, AF_INET, factory)

    reactor.run()


if __name__ == '__main__':
    if len(argv) == 1:
        main()
    else:
        main(int(argv[1]))

对于较旧的版本,有时可以使用fork共享端口.但是,这很容易出错,在某些平台上会失败,并且不是使用Twisted的受支持方式:

With older versions, you can sometimes get away with using fork to share the port. However, this is rather error prone, fails on some platforms, and isn't a supported way to use Twisted:

from os import fork

from twisted.internet import reactor
from twisted.web.server import Site
from twisted.web.static import File

def main():
    root = File("/var/www")
    factory = Site(root)

    # Create a new listening port
    port = reactor.listenTCP(8080, factory)

    # Create a few more processes to also service that port
    for i in range(3):
        if fork() == 0:
            # Proceed immediately onward in the children.
            # The parent will continue the for loop.
            break

    reactor.run()


if __name__ == '__main__':
    main()

之所以起作用是因为fork的正常行为,在该行为中,新创建的进程(子进程)继承了原始进程(父进程)的所有内存和文件描述符.由于以其他方式隔离了进程,所以这两个进程不会相互干扰,至少就它们正在执行的Python代码而言不会.由于文件描述符是继承的,因此父级或任何子级都可以接受端口上的连接.

This works because of the normal behavior of fork, where the newly created process (the child) inherits all of the memory and file descriptors from the original process (the parent). Since processes are otherwise isolated, the two processes don't interfere with each other, at least as far as the Python code they are executing goes. Since the file descriptors are inherited, either the parent or any of the children can accept connections on the port.

由于转发HTTP请求是如此简单,因此我怀疑您会注意到使用这些技术中的任何一种都能大大提高性能.前者比代理要好一些,因为它简化了部署并更轻松地用于非HTTP应用程序.后者可能是更多的责任,不值得接受.

Since forwarding HTTP requests is such an easy task, I doubt you'll notice much of a performance improvement using either of these techniques. The former is a bit nicer than proxying, because it simplifies your deployment and works for non-HTTP applications more easily. The latter is probably more of a liability than it's worth accepting.

这篇关于多核/多处理器上的TwistedWeb的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆