如何从运行速度最快的CherryPy BackgroundTask返回数据 [英] How to return data from a CherryPy BackgroundTask running as fast as possible

查看:44
本文介绍了如何从运行速度最快的CherryPy BackgroundTask返回数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个Web服务,以使用CherryPy进行迭代的批处理数据.理想的工作流程如下:

I'm building a web service for iterative batch processing of data using CherryPy. The ideal workflow is as follows:

  1. 用户将数据发布到服务中进行处理
  2. 处理作业空闲时,它将收集排队的数据并开始另一个迭代
  3. 正在处理作业的同时,用户正在将更多数据发布到队列中以进行下一次迭代
  4. 当前迭代完成后,结果将传回,以便用户可以使用相同的API来获取它们.
  5. 该作业从下一批排队的数据重新开始.

此处的主要考虑因素是,处理应尽可能快地运行,并且每次迭代都应在上一个迭代完成后立即开始,而不管队列中的数据量如何.每次迭代要花多长时间没有上限,因此我无法为其创建固定的运行时间表.

The key consideration here is that the processing should run as fast as possible with each iteration starting as soon as the previous one finishes, regardless of the amount of data in the queue. There's no upper bound on how long each iteration can take so I can't create a fixed schedule for it to run on.

有一些使用 BackgroundTask 的示例(像这样),但我还没有找到一个处理返回数据的方法,或者一个处理任务的速度尽可能快的任务,而不是按固定的时间表进行任务.

There are a few examples of using BackgroundTask (like this one) but I've yet to find one that deals with returning data, or one that deals with tasks running as fast as possible as opposed to on a fixed schedule.

我不愿意嫁给 BackgroundTask 解决方案,因此,如果有人可以提供替代方案,我会非常高兴.感觉好像框架内有一个解决方案.

I'm not wedded to the BackgroundTask solution so if anyone can offer an alternative one I'd be more than happy. It feels like there's a solution within the framework though.

推荐答案

不要使用 BackgroundTask 解决方案运行后台任务,因为它会在线程中运行,并且由于 GIL ,cherrypy将无法回答新请求.使用在不同过程中运行后台任务的队列解决方案,例如 Celery

Don't run a background task using the BackgroundTask solution, because it will run in a thread and, due to the GIL, cherrypy won't be able to answer new requests. Use a queue solution that runs your background tasks in a different process, like Celery or RQ.

我将详细开发一个使用RQ的示例.RQ使用Redis作为消息代理,因此首先需要安装并启动Redis.

I'm going to develop in detail an example using RQ. RQ uses Redis as a message broker, so first of all you need to install and start Redis.

然后创建一个具有长时间运行的后台方法的模块(在我的示例中为 mytask ):

Then create a module (mytask in my example) with the long time running background methods:

import time
def long_running_task(value):
    time.sleep(15)
    return len(value)

启动一个(或如果要并行运行任务,则启动多个)RQ工作程序,运行工作程序的python可以访问您的 mytask 模块(在导出PYTHONPATH之前先访问它很重要),这一点很重要运行您的工作程序(如果您的模块不在路径中):

Start one (or more than one if you want to run tasks in parallel) RQ workers, it's important that the python that is running your workers has access to your mytask module (export the PYTHONPATH before running the worker if your module it's not already in the path):

# rq worker

上面有一个非常简单的cherrypy webapp,显示了如何使用RQ队列:

Above you have a very simple cherrypy webapp that shows how to use the RQ queue:

import cherrypy
from redis import Redis
from rq import Queue    
from mytask import long_running_task


class BackgroundTasksWeb(object):

    def __init__(self):
        self.queue = Queue(connection=Redis())
        self.jobs = []

    @cherrypy.expose
    def index(self):
        html =  ['<html>', '<body>']
        html += ['<form action="job">', '<input name="q" type="text" />', '<input type="submit" />', "</form>"]
        html += ['<iframe width="100%" src="/results" />']
        html += ['</body>', '</html>']
        return '\n'.join(html)

    @cherrypy.expose
    def results(self):
        html = ['<html>', '<head>', '<meta http-equiv="refresh" content="2" >', '</head>', '<body>']
        html += ['<ul>']
        html += ['<li>job:{} status:{} result:{} input:{}</li>'.format(j.get_id(), j.get_status(), j.result, j.args[0]) for j in self.jobs]
        html += ['</ul>']
        html += ['</body>', '</html>']
        return '\n'.join(html)

    @cherrypy.expose
    def job(self, q):
        job = self.queue.enqueue(long_running_task, q)
        self.jobs.append(job)
        raise cherrypy.HTTPRedirect("/")


cherrypy.quickstart(BackgroundTasksWeb())

在生产型Web应用程序中,我将使用jinja2模板引擎生成html,最可能使用websocket在Web浏览器中更新作业状态.

In a production webapp I would use jinja2 template engine to generate the html, and most likely websockets to update the job status in the web browser.

这篇关于如何从运行速度最快的CherryPy BackgroundTask返回数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆