气流任务陷入“排队”状态。状态,永远不会运行 [英] Airflow tasks get stuck at "queued" status and never gets running

查看:326
本文介绍了气流任务陷入“排队”状态。状态,永远不会运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Airflow v1.8.1,并在kubernetes&上运行所有组件(工作程序,Web,Flower,Scheduler)。码头工人
我将Celery Executor与Redis一起使用,我的任务如下:

I'm using Airflow v1.8.1 and run all components (worker, web, flower, scheduler) on kubernetes & Docker. I use Celery Executor with Redis and my tasks are looks like:

(start) -> (do_work_for_product1)
     ├  -> (do_work_for_product2)
     ├  -> (do_work_for_product3)
     ├  …

因此开始任务具有多个下游。
然后我设置与并发相关的配置,如下所示:

So the start task has multiple downstreams. And I setup concurrency related configuration as below:

parallelism = 3
dag_concurrency = 3
max_active_runs = 1

然后当我手动运行此DAG时(不确定它是否不会按计划运行)任务),某些下游程序将被执行,而其他下游程序则处于排队状态。

Then when I run this DAG manually (not sure if it never happens on a scheduled task) , some downstreams get executed, but others stuck at "queued" status.

如果我从Admin UI中清除了该任务,它将被执行。
没有工作日志(在处理了一些第一个下游之后,它只是不输出任何日志)。

If I clear the task from Admin UI, it gets executed. There is no worker log (after processing some first downstreams, it just doesn't output any log).

Web服务器的日志(不确定正在退出的工人是否与之相关)

Web server's log (not sure worker exiting is related)

/usr/local/lib/python2.7/dist-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.
  .format(x=modname), ExtDeprecationWarning
[2017-08-24 04:20:56,496] [51] {models.py:168} INFO - Filling up the DagBag from /usr/local/airflow_dags
[2017-08-24 04:20:57 +0000] [27] [INFO] Handling signal: ttou
[2017-08-24 04:20:57 +0000] [37] [INFO] Worker exiting (pid: 37)

调度程序上也没有错误记录。每当我尝试执行此操作时,都会卡住许多任务。

There is no error log on scheduler, too. And a number of tasks get stuck is changing whenever I try this.

因为我也使用Docker,所以我想知道是否与此相关:
https://github.com/puckel/docker-airflow/issues/94
但是到目前为止,没有任何线索。

Because I also use Docker I'm wondering if this is related: https://github.com/puckel/docker-airflow/issues/94 But so far, no clue.

有人遇到过类似的问题或有什么想法我可以对此问题进行调查吗??

Has anyone faced with a similar issue or have some idea what I can investigate for this issue...?

推荐答案

任务卡住很可能是一个错误。目前(< = 1.9.0alpha1),当任务甚至无法在(远程)工作程序上启动时,就可能发生。例如,在工作程序超载或缺少依赖项的情况下会发生这种情况。

Tasks getting stuck is, most likely, a bug. At the moment (<= 1.9.0alpha1) it can happen when a task cannot even start up on the (remote) worker. This happens for example in the case of an overloaded worker or missing dependencies.

补丁应该可以解决该问题。

This patch should resolve that issue.

值得研究为什么您的任务没有进入RUNNING状态。将自己设置为这种状态是任务要做的第一件事。通常,工作程序会在开始执行之前记录日志,并且还会报告和出错。您应该可以在 task 日志中找到此项。

It is worth investigating why your tasks do not get a RUNNING state. Setting itself to this state is first thing a task does. Normally the worker does log before it starts executing and it also reports and errors. You should be able to find entries of this in the task log.

编辑:如前所述如果气流无法运行任务的一个例子是无法写入所需位置时,请对原始问题发表评论。这使其无法继续执行,并且任务会卡住。该修补程序通过使调度程序中的任务失败来解决此问题。

edit: As was mentioned in the comments on the original question in case one example of airflow not being able to run a task is when it cannot write to required locations. This makes it unable to proceed and tasks would get stuck. The patch fixes this by failing the task from the scheduler.

这篇关于气流任务陷入“排队”状态。状态,永远不会运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆