捕获芹菜工人的Heroku SIGTERM,优雅地关闭工人 [英] Capture Heroku SIGTERM in Celery workers to shutdown worker gracefully

查看:171
本文介绍了捕获芹菜工人的Heroku SIGTERM,优雅地关闭工人的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这方面做了大量的研究,我很惊讶,我还没有找到一个好的答案,但它在任何地方。



我是在Heroku上运行一个大型应用程序,并且我有一些运行很长时间的芹菜任务,并且在任务结束时保存一个结果。每次我重新部署Heroku时,它都会发送SIGTERM(最终,SIGKILL)并杀死我的正在运行的工作者。我试图找到一种方法让worker实例正常关闭,然后重新排队以便稍后处理,以便最终我们可以保存所需的结果,而不会丢失排队的任务。



我无法找到让工作人员正确听取SIGTERM的方法。使用工头模拟Heroku时,直接运行 python manage.py celeryd NOT 时工作得最接近,如下所示:

  @ app.task(bind = True,max_retries = 1)
def slow(self,x):
尝试:
在范围内(100):
print'x:'+ unicode(x)
time.sleep(10)
except exceptions.MaxRetriesExceededError:
logger.error('whoa')
except(exceptions.WorkerShutdown,exceptions.WorkerTerminate)as exc:
logger.error(u'retrying,'+ unicode(exc))$ b $除非(KeyboardInterrupt,SystemExit)为exc:
print'retrying'
raise self.retry(exc = exc,countdown = 10)
除外10)
else:
return x
finally:
logger.info('task ended!')

当我开始在前面运行这个芹菜任务时发生以下情况:

  ^ CSIGINT收到
22:20:59 system |发送SIGTERM到所有进程
22:20:59 web.1 |用代码0退出
22:21:04 system |发送SIGKILL到所有进程
被终止:9

所以很明显没有任何的芹菜异常,也没有在其他文章中看到的 KeyboardInterrupt SystemExit 异常,正确捕获SIGTERM并关闭worker。



正确的做法是什么?

解决方案

芹菜不幸的是,它并没有被设计成干净关闭。 EVER。我是认真的。芹菜工作人员对SIGTERM做出响应,但如果任务不完整,工作进程将等待完成任务,然后才能退出。在这种情况下,如果员工在合理的时间内没有关闭,可以发送SIGKILL信息,但这种情况下会丢失信息,即您可能不知道哪些工作未完成。

I've done a ton of research on this, and I'm surprised I haven't found a good answer to this yet anywhere.

I'm running a large application on Heroku, and I have certain celery tasks that run for a very long time processing, and at the end of the task save a result. Every time I redeploy on Heroku, it sends SIGTERM (and eventually, SIGKILL) and kills my running worker. I'm trying to find a way for the worker instance to shut itself down gracefully and re-queue itself for processing later so that eventually we can save the required result instead of losing the queued task.

I cannot find a way that works to have the worker listen for SIGTERM properly. The closest I've gotten, which works when running python manage.py celeryd directly but NOT when emulating Heroku using foreman, is the following:

@app.task(bind=True, max_retries=1)
def slow(self, x):
    try:
        for x in range(100):
            print 'x: ' + unicode(x)
            time.sleep(10)
    except exceptions.MaxRetriesExceededError:
        logger.error('whoa')
    except (exceptions.WorkerShutdown, exceptions.WorkerTerminate) as exc:
        logger.error(u'retrying, ' + unicode(exc))
        raise self.retry(exc=exc, countdown=10)
    except (KeyboardInterrupt, SystemExit) as exc:
        print 'retrying'
        raise self.retry(exc=exc, countdown=10)
    else:
        return x
    finally:
        logger.info('task ended!')

When I start this celery task running within foreman and hit Ctrl+C, the following happens:

^CSIGINT received
22:20:59 system   | sending SIGTERM to all processes
22:20:59 web.1    | exited with code 0
22:21:04 system   | sending SIGKILL to all processes
Killed: 9

So it's clear that none of the celery exceptions, nor the KeyboardInterrupt or SystemExit exceptions I've seen in other posts, properly catch SIGTERM and shut down the worker.

What is the right way to do this?

解决方案

celery was unfortunately not designed to do clean shutdown. EVER. I mean it. celery workers respond to SIGTERM but if a task is incomplete, the worker processes will wait to finish the task and only then exit. In which case, you can send it SIGKILL if the workers don't shut down in a reasonable time but there will be a loss of information in this case i.e. you may not know which jobs remained incomplete.

这篇关于捕获芹菜工人的Heroku SIGTERM,优雅地关闭工人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆