分布式芹菜调度程序 [英] Distributed Celery scheduler

查看:42
本文介绍了分布式芹菜调度程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找Python的类似cron的分布式框架,并找到了Celery.但是,文档说:您必须确保一次只运行一个调度程序来执行某个调度,否则最终将导致重复的任务",Celery正在使用celery.beat.PersistentScheduler将调度存储到本地文件中.

I'm looking for a distributed cron-like framework for Python, and found Celery. However, the docs says "You have to ensure only a single scheduler is running for a schedule at a time, otherwise you would end up with duplicate tasks", Celery is using celery.beat.PersistentScheduler which store the schedule to a local file.

所以,我的问题是,除了默认值之外,还有其他实现可以将计划放入集群"并协调任务执行,以便每个任务只运行一次吗?我的目标是能够在群集中的所有主机上以相同的时间表运行celerybeat.

So, my question, is there another implementation than the default that can put the schedule "into the cluster" and coordinate task execution so that each task is only run once? My goal is to be able to run celerybeat with identical schedules on all hosts in the cluster.

谢谢

推荐答案

tl; dr :没有Celerybeat不适合您的用例.您只需运行 celerybeat 的一个进程,否则您的任务将重复.

tl;dr: No Celerybeat is not suitable for your use case. You have to run just one process of celerybeat, otherwise your tasks will be duplicated.

我知道这是一个非常老的问题.我将尝试做一个小总结,因为我有相同的问题(在2018年).

I know this is a very old question. I will try to make a small summary because I have the same problem/question (in the year 2018).

一些背景:我们正在Kubernetes集群中运行Django应用程序(与Celery一起使用).集群(EC2实例)和Pod(〜容器)是自动缩放的:简单地说,我不知道何时,多少个应用程序实例在运行.

Some background: We're running Django application (with Celery) in the Kubernetes cluster. Cluster (EC2 instances) and Pods (~containers) are autoscaled: simply said, I do not know when and how many instances of the application are running.

您有责任只运行 celerybeat 的一个进程,否则,您的任务将重复. [1] 在Celery存储库中存在以下功能请求: [2]

It's your responsibility to run only one process of the celerybeat, otherwise, your tasks will be duplicated. [1] There was this feature request in the Celery repository: [2]

要求用户确保只有一个celerybeat实例存在于他们的集群中,创建了实质性的实现负担(创建单个故障点或鼓励用户滚动他们自己的分布式互斥锁.

Requiring the user to ensure that only one instance of celerybeat exists across their cluster creates a substantial implementation burden (either creating a single point-of-failure or encouraging users to roll their own distributed mutex).

celerybeat应该提供一种防止意外的机制并发,否则文档应建议最佳实践方法.

celerybeat should either provide a mechanism to prevent inadvertent concurrency, or the documentation should suggest a best-practice approach.

一段时间后,此功能请求因缺少资源而被Celery的作者拒绝. [3] 我强烈建议阅读Github上的整个主题.人们在那里推荐这些项目/解决方案:

After some time, this feature request was rejected by the author of Celery for lack of resources. [3] I highly recommend reading the entire thread on the Github. People there recommend these project/solutions:

我没有尝试以上任何方法(我不想在我的应用程序中出现其他依赖关系,并且我不喜欢锁定任务/您需要处理故障转移等.)

I did not try anything from the above (I do not want another dependency in my app and I do not like locking tasks /you need to deal with fail-over etc./).

我最终在Kubernetes中使用了CronJob( https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/).

I ended up using CronJob in Kubernetes (https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/).

[1] celerybeat-多个实例&监视

[2] https://github.com/celery/celery/issues/251

[3] https://github.com/celery/celery/issues/251#issuecomment-228214951

这篇关于分布式芹菜调度程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆