芹菜中未处理的异常冻结了工人 [英] Unhandled exception in celery freezes workers

查看:36
本文介绍了芹菜中未处理的异常冻结了工人的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过Redis后端在docker中运行celery.我有

I am running celery in docker via a redis backend. I have

  • 芹菜容器
  • 芹菜工人容器
  • redis容器

芹菜工人容器产生6个工人进程.如果芹菜任务遇到异常,工人(所有人)将停止消耗工作.我尝试对进程进行一些调试,似乎单个进程将卡在 pipe 读取中,其余进程将卡在 futex 调用中.

The celery worker container spawns 6 worker processes. If a celery task encounters an exception, the workers (all of them) stop consuming jobs. I tried to debug the processes a bit and it appear a single process will get stuck on pipe read and the rest on a futex call.

调试信息:

# Worker 1
$ sudo strace -p 15959 -s 10000
strace: Process 15959 attached
read(4, ^Cstrace: Process 15959 detached
 <detached ...>

# Worker 2 through N
$ sudo strace -p 15960 -s 10000
strace: Process 15960 attached
futex(0x7f95c3f94000, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, NULL, 0xffffffff^Cstrace: Process 15960 detached
 <detached ...>

$ sudo lsof -p 15958
COMMAND   PID USER   FD      TYPE DEVICE SIZE/OFF   NODE NAME
celery  15958 root  txt       REG  0,197    32248 264184 /usr/local/bin/python3.5
...
celery  15958 root    4r     FIFO   0,12      0t0 348559 pipe  # frozen here

奇怪的是,除非发生以下两种情况之一,否则工作人员将永久保持冻结状态:

Strangely, the workers will stay in a frozen state permanently, unless 1 of 2 things happen:

  1. 重新启动工作程序( docker restart celery-worker )
  2. 跳起芹菜.

我觉得跳起芹菜"很有趣.通过发出此命令,所有工人都醒来",恢复活力,开始消耗任务直到下一个异常.

The "jump start celery" I find amusing. By issuing this command all the workers "wake up" and spring back to life and start consuming tasks until the next exception.

docker exec -it celery-worker celery -A CELERY_APP inspect active

工人在这里重生.

$ sudo strace -p 15958 -s 10000
strace: Process 15958 attached
read(4, "\0\0\3\36", 4)                 = 4
read(4, "\200\3K\2(Mj\nNccelery.app.trace\n_fast_trace_task\nq\0(X\"\0\0\0tasks.status_taskq\1X$\0\0\00071bf9972-cf5b-4a20-a8b7-ce4d7921fe0dq\2}q\3(X\t\0\0\0parent_idq\4NX\4\0\0\0langq\5X\2\0\0\0pyq\6X\3\0\0\0etaq\7NX\5\0\0\0groupq\10NX\7\0\0\0expiresq\tNX\t\0\0\0timelimitq\n]q\v(NNeX\6\0\0\0originq\fX\21\0\0\0gen1@03e7668436e5q\rX\10\0\0\0argsreprq\16X\2\0\0\0()q\17X\n\0\0\0kwargsreprq\20X\2\0\0\0{}q\21X\10\0\0\0reply_toq\22X$\0\0\0005ad0db0b-a759-375c-b173-07598914633eq\23X\4\0\0\0taskq\24h\1X\16\0\0\0correlation_idq\25X$\0\0\00071bf9972-cf5b-4a20-a8b7-ce4d7921fe0dq\26X\7\0\0\0root_idq\27X$\0\0\00071bf9972-cf5b-4a20-a8b7-ce4d7921fe0dq\30X\7\0\0\0retriesq\31K\0X\r\0\0\0delivery_infoq\32}q\33(X\10\0\0\0priorityq\34K\0X\10\0\0\0exchangeq\35X\0\0\0\0q\36X\v\0\0\0redeliveredq\37NX\v\0\0\0routing_keyq X\6\0\0\0celeryq!uX\6\0\0\0shadowq\"NX\2\0\0\0idq#h\2uCM[[], {}, {\"chord\": null, \"chain\": null, \"errbacks\": null, \"callbacks\": null}]q$X\20\0\0\0application/jsonq%X\5\0\0\0utf-8q&tq'}q(tq)\206q*.", 798) = 798
futex(0x7f95c3f94000, FUTEX_WAKE, 1)    = 1
write(7, "\0\0\0\34\200\3K\0(Mj\nNG@\327\204T\213\21\21\\K\nNtq\0\206q\1.", 32) = 32
getpid()                                = 10

有人知道为什么会这样吗?这是一个错误吗?有什么我可以配置的,所以芹菜不会挂在任务异常上吗?

Any idea why this is? Is this a bug? Is there something I can configure so celery does not hang on task exception?

推荐答案

我正在使用 eventlet ,并且正在使用默认的 pre-fork 池运行工作程序.切换到 eventlet 池似乎已解决了该问题.

I am using eventlet and I was running the workers with the default pre-fork pool. Switching to the eventlet pool seems to have fixed the problem.

celery worker -A CELERY_APP --pool eventlet

这篇关于芹菜中未处理的异常冻结了工人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆