气流计划程序未获取DAG运行 [英] Airflow Scheduler not picking up DAG Runs

查看:140
本文介绍了气流计划程序未获取DAG运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在设置气流,以使网络服务器在一台计算机上运行,​​而调度程序在另一台计算机上运行。两者共享同一个MySQL Metastore数据库。这两个实例都出现在日志中,没有任何错误,但是调度程序没有拾取任何通过Web UI手动触发DAG创建的DAG运行。

I'm setting up airflow such that webserver runs on one machine and scheduler runs on another. Both share the same MySQL metastore database. Both instances come up without any errors in the logs but the scheduler is not picking up any DAG Runs that are created by manually triggering the DAGs via the Web UI.

MysQL中的dag_run表显示了一些条目,它们都处于运行状态:

The dag_run table in MysQL shows few entries, all in running state:

mysql> select * from dag_run;
+----+--------------------------------+----------------------------+---------+------------------------------------+------------------+----------------+----------+----------------------------+
| id | dag_id                         | execution_date             | state   | run_id                             | external_trigger | conf   | end_date | start_date                 |
+----+--------------------------------+----------------------------+---------+------------------------------------+------------------+----------------+----------+----------------------------+
|  1 | example_bash_operator          | 2017-12-14 11:33:08.479040 | running | manual__2017-12-14T11:33:08.479040 |                1 | ��       }�.    | NULL     | 2017-12-14 11:33:09.000000 |
|  2 | example_bash_operator          | 2017-12-14 11:38:27.888317 | running | manual__2017-12-14T11:38:27.888317 |                1 | ��       }�.    | NULL     | 2017-12-14 11:38:27.000000 |
|  3 | example_branch_dop_operator_v3 | 2017-12-14 13:47:05.170752 | running | manual__2017-12-14T13:47:05.170752 |                1 | ��       }�.    | NULL     | 2017-12-14 13:47:05.000000 |
|  4 | example_branch_dop_operator_v3 | 2017-12-15 04:26:07.208501 | running | manual__2017-12-15T04:26:07.208501 |                1 | ��       }�.    | NULL     | 2017-12-15 04:26:07.000000 |
|  5 | example_branch_dop_operator_v3 | 2017-12-15 06:12:10.965543 | running | manual__2017-12-15T06:12:10.965543 |                1 | ��       }�.    | NULL     | 2017-12-15 06:12:11.000000 |
|  6 | example_branch_dop_operator_v3 | 2017-12-15 06:28:43.282447 | running | manual__2017-12-15T06:28:43.282447 |                1 | ��       }�.    | NULL     | 2017-12-15 06:28:43.000000 |
+----+--------------------------------+----------------------------+---------+------------------------------------+------------------+----------------+----------+----------------------------+
6 rows in set (0.21 sec)

但是在另一台计算机上启动并连接到同一台机器的Scheduler MySQL DB只是不愿意与该DB对话并实际上运行这些DAG运行并将其转换为任务实例。

But the Scheduler that's started up on another machine and connected to the same MySQL DB is just not interested in talking to this DB and actually running these DAG runs and converting them to Task Instances.

不确定在这里的设置中我缺少什么。这么几个问题:

Not sure what I'm missing in the setup here. So few questions:


  1. 何时以及如何填充位于$ AIRFLOW_HOME / dags的DAGS文件夹?我认为这是在启动Web服务器时。但是,如果我只是在另一台计算机上启动调度程序,那么如何填充该计算机上的DAGS文件夹?

  2. 当前,我仅在托管网络服务器的计算机上而不是在调度程序上进行气流initdb的处理。希望这是正确的。

我可以为Scheduler启用调试日志以获取更多可能指示缺少内容的日志吗?从当前日志看来,它只是在本地系统的DAGS文件夹中查找,尽管配置加载了设置为True的示例,但那里没有DAGS(甚至没有示例)。

Can I enable debug logs for Scheduler to get more logs that could indicate what's missing? From the current logs it looks like it just looks in the DAGS folder on local system and finds no DAGS there ( not even example ones ) inspite of the config to load examples set as True.

不认为这很重要,但是我目前正在使用LocalExecutor

Don't think it matters but I'm currently using a LocalExecutor

任何帮助都可以得到。

Any help is appreciated.

编辑:我知道我需要按照气流文档的建议在多台计算机之间同步DAGS文件夹,但不确定这是否是Scheduler不承担任务的原因在上述情况下。

I know that I need to sync up DAGS folder across machines as the airflow docs suggest but not sure if this is the reason why Scheduler is not picking up the tasks in the above case.

推荐答案

好,我得到了答案-看起来调度程序不会查询数据库,除非本地有任何DAGS DAG文件夹。 job.py 中的代码如下

Ok, I got the answer - It looks like the Scheduler does not query the DB until there are any DAGS in the local DAG Folder. The code in job.py looks like

ti_query = (
        session
        .query(TI)
        .filter(TI.dag_id.in_(simple_dag_bag.dag_ids))
        .outerjoin(DR,
            and_(DR.dag_id == TI.dag_id,
                 DR.execution_date == TI.execution_date))
        .filter(or_(DR.run_id == None,
                not_(DR.run_id.like(BackfillJob.ID_PREFIX + '%'))))
        .outerjoin(DM, DM.dag_id==TI.dag_id)
        .filter(or_(DM.dag_id == None,
                not_(DM.is_paused)))
    )

我在计算机的本地DAG文件夹中添加了一个简单的DAG托管Scheduler,它也开始选择其他DAG实例。

I added a simple DAG in my local DAG folder on the machine hosting Scheduler and it started picking up other DAG instances as well.

我们为此提出了一个问题- https://issues.apache.org/jira/browse/AIRFLOW-1934

We raised an issue for this - https://issues.apache.org/jira/browse/AIRFLOW-1934

这篇关于气流计划程序未获取DAG运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆