解决芹菜跳动是单点故障 [英] Work around celerybeat being a single point of failure

查看:59
本文介绍了解决芹菜跳动是单点故障的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找推荐的解决方案,以解决celerybeat成为celery / rabbitmq部署的单点故障的问题。到目前为止,通过搜索网络,我还没有发现任何有意义的东西。

I'm looking for recommended solution to work around celerybeat being a single point of failure for celery/rabbitmq deployment. I didn't find anything that made sense so far, by searching the web.

对于我来说,每天定时的调度程序会启动一系列可以运行的作业半天或更长时间。由于只能有一个celerybeat实例,因此如果它或正在运行的服务器发生故障,将无法运行关键作业。

In my case, once a day timed scheduler kicks off a series of jobs that could run for half a day or longer. Since there can only be one celerybeat instance, if something happens to it or the server that it's running on, critical jobs will not be run.

我希望有这已经是一个可行的解决方案,因为我不是唯一需要可靠(群集等)调度程序的解决方案。如果不需要的话,我不想求助于某种由数据库支持的调度程序。

I'm hoping there is already a working solution for this, as I can't be the only one who needs reliable (clustered or the like) scheduler. I don't want to resort to some sort of database-backed scheduler, if I don't have to.

推荐答案

在celery github存储库中有一个关于此的公开问题。

There is an open issue in celery github repo about this. Don't know if they are working on it though.

作为解决方法,您可以为任务添加锁,以便一次只能运行1个特定PeriodicTask实例。 。

As a workaround you could add a lock for tasks so that only 1 instance of specific PeriodicTask will run at a time.

类似

if not cache.add('My-unique-lock-name', True, timeout=lock_timeout):
    return

锁定超时很好,很棘手。如果不同的celebeatbeat尝试在不同的时间运行它们,我们将使用0.9 *任务run_every seconds。
0.9只是为了保留一些利润(例如,当芹菜一次落后于进度一点,那么它就如期进行,这将导致锁仍然处于活动状态)。

Figuring out lock timeout is well, tricky. We're using 0.9 * task run_every seconds if different celerybeats will try to run them at different times. 0.9 just to leave some margin (e.g. when celery is a little behind schedule once, then it is on schedule which would cause lock to still be active).

然后,您可以在所有计算机上使用celerybeat实例。每个任务都会在每个celerybeat实例中排队,但是其中只有一个任务会完成运行。

Then you can use celerybeat instance on all machines. Each task will be queued for every celerybeat instance but only one task of them will finish the run.

任务仍然会以这种方式尊重run_-最坏的情况是:任务将运行

Tasks will still respect run_every this way - worst case scenario: tasks will run at 0.9*run_every speed.

这种情况的一个问题:如果任务已排队但未在计划的时间处理(例如,因为队列处理器不可用),则可能锁定被放置在错误的时间,可能导致下一个任务根本无法运行。要解决此问题,无论是否按时完成任务,您都需要某种检测机制。

One issue with this case: if tasks were queued but not processed at scheduled time (for example because queue processors was unavailable) - then lock may be placed at wrong time causing possibly 1 next task to simply not run. To go around this you would need some kind of detection mechanism whether task is more or less on time.

不过,在生产环境中使用时,这不应该是普遍现象

Still, this shouldn't be a common situation when using in production.

另一种解决方案是将celerybeat Scheduler子类化并覆盖其tick方法。然后为每个刻度添加一个锁,然后再处理任务。这样可以确保只有具有相同定期任务的芹菜不会多次将相同任务排队。每一个滴答声只有一个芹菜节拍(一个赢得比赛条件的人)将排队任务。

Another solution is to subclass celerybeat Scheduler and override its tick method. Then for every tick add a lock before processing tasks. This makes sure that only celerybeats with same periodic tasks won't queue same tasks multiple times. Only one celerybeat for each tick (one who wins the race condition) will queue tasks. In one celerybeat goes down, with next tick another one will win the race.

这当然可以与第一种解决方案结合使用。

This of course can be used in combination with the first solution.

要使其正常工作,必须为所有服务器复制和/或共享缓存后端。

Of course for this to work cache backend needs to be replicated and/or shared for all of servers.

这是一个老问题,但我希望它对任何人都有帮助。

It's an old question but I hope it helps anyone.

这篇关于解决芹菜跳动是单点故障的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆