使用Celery实时,使用Gevent进行同步外部API查询 [英] Using Celery for Realtime, Synchronous External API Querying with Gevent

查看:126
本文介绍了使用Celery实时,使用Gevent进行同步外部API查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个Web应用程序,该应用程序将接收到一个用户的请求,并且必须使用一些外部API来撰写该请求的答案。这可以直接从主要网络线程使用类似gevent的方式来完成请求。

I'm working on a web application that will receive a request from a user and have to hit a number of external APIs to compose the answer to that request. This could be done directly from the main web thread using something like gevent to fan out the request.

或者,我在想,我可以将传入的请求放入队列中,使用工人分配负载。这个想法将是尝试保持实时,同时分离几个工作人员的请求。每个这些工作人员只会查询许多外部API中的一个。他们接收到的响应将经过一系列变换,被保存到一个数据库中,被转换成一个通用的模式并保存在一个通用的数据库中,最终被组合成一个通过Web请求返回的一个大的响应。网络请求最有可能在这段时间内被阻止,用户等待,所以保持
尽可能快的排队和出队很重要。

Alternatively, I was thinking, I could put incoming requests into a queue and use workers to distribute the load. The idea would be to try to keep it real time, while splitting up the requests amongst several workers. Each of these workers would be querying only one of the many external APIs. The response they receive would then go through a series transformations, be saved into a DB, be transformed to a common schema and saved in a common DB to finally be composed into one big response that would be returned through the web request. The web request is most likely going to be blocking all this time, with a user waiting, so keeping the queueing and dequeueing as fast as possible is important.

外部API调用可以轻松转化为单独的任务。我认为从一个api任务到一个DB保存任务的转换链接
可以使用一个链等来完成,最终结果将使用和弦返回到web线程的所有结果组合起来。

The external API calls can easily be turned into individual tasks. I think the linking from one api task to a transformation to a DB saving task could be done using a chain, etc, and the final result combining all results returned to the web thread using a chord.

有些问题:


  • 可以使用芹菜?

  • 我正在使用django。我应该尝试在纯芹菜上使用django-芹菜吗?

  • 这些任务中的每一个都可能产生其他任务,例如记录刚发生的
    或其他类型的分支。这可能是可能的吗?

  • 任务可以返回他们得到的数据 - 即通过芹菜(可能是在这种情况下为底层)的潜在Kb数据,或者应该写入数据库,只是传递指向该数据的指针?

  • 每个任务大多是I / O绑定,最初只是使用web线程中的gevent来抛出请求并跳过整个排队设计,但事实证明它将被重用于不同的组件。试图通过Qs实时保持整个往返行程可能需要许多工作人员确保排队大多是空的。还是吗运行几何工作池可以帮助这个吗?

  • 我必须编写地理特定任务,还是自动使用地理池处理网络IO?

  • 可以为某些任务分配优先权吗?

  • 如何保持顺序?

  • 我应该跳过芹菜,只是使用kombu吗?

  • 似乎芹菜更适合于可以推迟的任务,而且
    不是时间敏感的。我真的想要保持这个实时吗?

  • 我还应该看看其他什么技术?

  • Can this (and should this) be done using celery?
  • I'm using django. Should I try to use django-celery over plain celery?
  • Each one of those tasks might spawn off other tasks - such as logging what just happened or other types of branching off. Is this possible?
  • Could tasks be returning the data they get - i.e. potentially Kb of data through celery (redis as underlying in this case) or should they write to the DB, and just pass pointers to that data around?
  • Each task is mostly I/O bound, and was initially just going to use gevent from the web thread to fan out the requests and skip the whole queuing design, but it turns out that it would be reused for a different component. Trying to keep the whole round trip through the Qs real time will probably require many workers making sure the queueus are mostly empty. Or is it? Would running the gevent worker pool help with this?
  • Do I have to write gevent specific tasks or will using the gevent pool deal with network IO automagically?
  • Is it possible to assign priority to certain tasks?
  • What about keeping them in order?
  • Should I skip celery and just use kombu?
  • It seems like celery is geared more towards "tasks" that can be deferred and are not time sensitive. Am I nuts for trying to keep this real time?
  • What other technologies should I look at?

更新:尝试将此更多的哈希。我在Kombu做了一些阅读,似乎能够做我正在想的,虽然在比芹菜低得多的水平上。这是我所想到的一个图表。

Update: Trying to hash this out a bit more. I did some reading on Kombu and it seems to be able to do what I'm thinking of, although at a much lower level than celery. Here is a diagram of what I had in mind.

与Kombu可访问的原始队列似乎是可能的,是许多工作人员订阅广播消息的能力。如果使用队列,则发布者不需要知道类型和编号。可以使用芹菜实现类似的效果吗?看起来,如果你想要和弦,你需要知道在运行时什么任务将涉及和弦,而在这种情况下,你可以简单地添加听众到广播,只是确保他们宣布他们在运行添加对最终队列的响应。

What seems to be possible with raw queues as accessible with Kombu is the ability for a number of workers to subscribe to a broadcast message. The type and number does not need to be known by the publisher if using a queue. Can something similar be achieved using Celery? It seems like if you want to make a chord, you need to know at runtime what tasks are going to be involved in the chord, whereas in this scenario you can simply add listeners to the broadcast, and simply make sure they announce they are in the running to add responses to the final queue.

更新2:我看到有一个广播能力您可以将它与和弦结合在一起吗?一般来说,你可以将芹菜和原料混合在一起吗?这开始听起来像是一个关于冰沙的问题。

Update 2: I see there is the ability to broadcast Can you combine this with a chord? In general, can you combine celery with raw kombu? This is starting to sound like a question about smoothies.

推荐答案

我将尝试尽可能多地回答问题。 / p>

I will try to answer as many of the questions as possible.


可以使用芹菜做这个(应该这样做)吗?

Can this (and should this) be done using celery?

是的,你可以


我使用django。我应该尝试在普通芹菜上使用django-芹菜吗?

I'm using django. Should I try to use django-celery over plain celery?

Django对芹菜有很好的支持,使生活更容易开发

Django has a good support for celery and would make the life much easier during development


这些任务中的每一个都可能产生其他任务,例如记录
刚发生的事情或其他类型的分支。这是可能的吗?

Each one of those tasks might spawn off other tasks - such as logging what just happened or other types of branching off. Is this possible?

您可以使用ignore_result = true启动子任务,仅对副作用

You can start subtasks from withing a task with ignore_result = true for only side effects


可能的任务是返回他们得到的数据 - 即可能的Kb
数据通过芹菜(redis作为底层的这种情况)或应该他们
写入数据库,只是传递指向这个数据的指针?

Could tasks be returning the data they get - i.e. potentially Kb of data through celery (redis as underlying in this case) or should they write to the DB, and just pass pointers to that data around?

我建议将结果放在db中,然后传递id会使你的经纪人和工作人员高兴。较少的数据传输/酸洗等。

I would suggest putting the results in db and then passing id around would make your broker and workers happy. Less data transfer/pickling etc.


每个任务大多是I / O绑定,最初只是使用
gevent从网络线程扇出请求,并跳过整个
排队设计,但事实证明,它将被重用于一个
不同的组件。尝试通过
Qs实时保持整个旅程可能需要许多工作人员确保
的排队大多为空。还是吗将运行地理工作者
池帮助这个?

Each task is mostly I/O bound, and was initially just going to use gevent from the web thread to fan out the requests and skip the whole queuing design, but it turns out that it would be reused for a different component. Trying to keep the whole round trip through the Qs real time will probably require many workers making sure the queueus are mostly empty. Or is it? Would running the gevent worker pool help with this?

由于进程是io绑定,所以gevent肯定会帮助这里。然而,对于gevent pool'd worker来说,并发性应该是多少,是我正在寻找的答案。

Since the process is io bound then gevent will definitely help here. However, how much the concurrency should be for gevent pool'd worker, is something that I'm looking for answer too.


必须编写地理特定任务,否则将使用地理池
自动处理网络IO?

Do I have to write gevent specific tasks or will using the gevent pool deal with network IO automagically?

Gevent做猴子补丁在池中使用时自动。但是,您使用的库应该与gevent一起玩得很好。否则,如果您使用simplejson(这是用c编写的)解析一些数据,那么这将阻止其他gevent小区。

Gevent does the monkey patching automatically when you use it in pool. But the libraries that you use should play well with gevent. Otherwise, if your parsing some data with simplejson (which is written in c) then that would block other gevent greenlets.


是否可以分配某些任务的优先级?

Is it possible to assign priority to certain tasks?

您不能为某些任务分配特定的优先级,而是将它们路由到不同的队列,然后将这些队列听力不一样的工人听。特定队列的工作人员越多,该队列中任务的优先级越高。

You cannot assign specific priorities to certain tasks, but route them to different queue and then have those queues being listened to by varying number of workers. The more the workers for a particular queue, the higher would be the priority of that tasks on that queue.


如何保持排序?

What about keeping them in order?

链条是维持秩序的一种方式。和弦是一个很好的总结方法。芹菜照顾它,所以你不用担心。即使使用地理池,最终可能会推理执行任务的顺序。

Chain is one way to maintain order. Chord is a good way to summarize. Celery takes care of it, so you dont have to worry about it. Even when using gevent pool, it would at the end be possible to reason about the order of the tasks execution.


我应该跳过芹菜和只需使用kombu?

Should I skip celery and just use kombu?

如果您的用例不会随着时间的推移变得更复杂,而且如果您愿意通过celeryd + supervisord自己管理你的进程。另外,如果您不关心芹菜,花等工具的任务监控。

You can, if your use case will not change to something more complex over time and also if you are willing to manage your processes through celeryd + supervisord by yourself. Also, if you don't care about the task monitoring that comes with tools such as celerymon, flower, etc.


看起来像芹菜更适合于可以
推迟并且不时间敏感的任务。

It seems like celery is geared more towards "tasks" that can be deferred and are not time sensitive.

芹菜还支持计划任务。如果这是你的意思。

Celery supports scheduled tasks as well. If that is what you meant by that statement.


我坚持尝试保持这个实时吗?

Am I nuts for trying to keep this real time?

我不这么认为。只要你的消费者足够快,它会像实时一样好。

I don't think so. As long as your consumers are fast enough, it will be as good as real time.


我还应该看看其他什么技术? p>

What other technologies should I look at?

与芹菜有关,您应该明智地选择结果商店。我的建议是使用cassandra。它对于实时数据(写入和查询都是有好处的)。你也可以使用redis或mongodb。他们有自己的问题作为结果店。但是稍后调整配置可以走很长的路。

Pertaining to celery, you should choose result store wisely. My suggestion would be to use cassandra. It is good for realtime data (both write and query wise). You can also use redis or mongodb. They come with their own set of problems as result store. But then a little tweaking in configuration can go a long way.

如果您的意思与芹菜完全不同,那么您可以查看asyncio(python3.5)和zeromq来达到相同的效果。我不能对此发表评论。

If you mean something completely different from celery, then you can look into asyncio (python3.5) and zeromq for achieving the same. I can't comment more on that though.

这篇关于使用Celery实时,使用Gevent进行同步外部API查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆