将气流调度程序作为守护进程运行 [英] Issues running airflow scheduler as a daemon process

查看:115
本文介绍了将气流调度程序作为守护进程运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用 LocalExecutor 运行1.8.0气流的EC2实例。根据文档,我希望以下两个命令之一可以在守护程序模式下启动调度程序:

I have an EC2 instance that is running airflow 1.8.0 using LocalExecutor. Per the docs I would have expected that one of the following two commands would have raised the scheduler in daemon mode:

airflow scheduler --daemon- -num_runs = 20

气流调度器--daemon = True --num_runs = 5

但是事实并非如此。第一个命令似乎将起作用,但是它只返回以下输出,然后返回到终端而不产生任何后台任务:

But that isn't the case. The first command seems like it's going to work, but it just returns the following output before returning to terminal without producing any background task:

[2017-09-28 18:15:02,794] {__init__.py:57} INFO - Using executor LocalExecutor
[2017-09-28 18:15:03,064] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/Grammar.txt
[2017-09-28 18:15:03,203] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python3.5/lib2to3/PatternGrammar.txt

第二个命令产生错误:

airflow scheduler: error: argument -D/--daemon: ignored explicit argument 'True'

这很奇怪,因为根据文档 -daemon = True 应该是 airflow Scheduler 调用的有效参数。

Which is odd, because according to the docs --daemon=True should be a valid argument for the airflow scheduler call.

深入挖掘使我进入了此StackOverflow帖子,其中一个回复建议根据 systemd 的实现,根据以下方法将气流调度程序作为后台进程进行处理该代码可作为此仓库使用。

Digging a little deeper took me to this StackOverflow post, where one of the responses recommends an implementation of systemd for handling the airflow scheduler as a background process according to the code available as this repo.

我对脚本进行了轻松编辑的改编内容如下。我在Ubuntu 16.04.3上使用香草m4.xlarge EC2实例:

My lightly-edited adaptations of the script are posted as the following Gists. I am using a vanilla m4.xlarge EC2 instance with Ubuntu 16.04.3:

  • /etc/sysconfig/airflow
  • /user/lib/systemd/system/airflow-scheduler.service
  • /etc/tmpfiles.d/airflow.conf

从那里我打电话:

sudo systemctl enable airflow-scheduler
sudo systemctl start airflow-scheduler

什么也没有发生。尽管我在此实例上运行的DAG更为复杂,但我正在使用此虚拟案例创建一个这个简单的测试还可以作为侦听器,让我知道调度程序是否按计划运行。

And nothing happens. While I have much more complex DAGs running on this instance, I am using this dummy case to create a simple test that also serves as a listener to let me know when the scheduler is operating as planned.

我一直在使用 journalctl -f 进行调试。这是调度程序过程的几行输出。没有明显的问题,但是我的任务没有执行,测试DAG的日志也没有生成,这可以帮助我放大错误。问题在这里吗?

I've been using journalctl -f to debug. Here are a few lines of output from the scheduler process. There's no obvious problem, but my tasks aren't executing and no logs are being produced for the test DAG that would help me zoom in on the error. Is the problem in here somewhere?

Sep 28 18:39:30 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:30,965] {dag_processing.py:627} INFO - Started a process (PID: 21822) to generate tasks for /home/ubuntu/airflow/dags/scheduler_test_dag.py - logging into /home/ubuntu/airflow/logs/scheduler/2017-09-28/scheduler_test_dag.py.log
Sep 28 18:39:31 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:31,016] {jobs.py:1002} INFO - No tasks to send to the executor
Sep 28 18:39:31 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:31,020] {jobs.py:1440} INFO - Heartbeating the executor
Sep 28 18:39:32 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:32,022] {jobs.py:1404} INFO - Heartbeating the process manager
Sep 28 18:39:32 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:32,023] {jobs.py:1440} INFO - Heartbeating the executor
Sep 28 18:39:33 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:33,024] {jobs.py:1404} INFO - Heartbeating the process manager
Sep 28 18:39:33 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:33,025] {dag_processing.py:559} INFO - Processor for /home/ubuntu/airflow/dags/capone_dash_dag.py finished
Sep 28 18:39:33 ip-172-31-15-209 airflow[20603]: [2017-09-28 18:39:33,026] {dag_processing.py:559} INFO - Processor for /home/ubuntu/airflow/dags/scheduler_test_dag.py finished

当我运行 airflow scheduler 手动运行,一切正常。由于我的测试DAG的开始日期为9月9日,因此从那以后每分钟都需要进行回填,从而产生运行时间滴答声。但是,当我使用 systemd 作为守护程序运行调度程序时,它完全安静,没有明显的错误源。

When I run airflow scheduler manually this all works fine. Since my test DAG has a start date of September 9 it just keep backfilling every minute since then, producing a running time ticker. When I use systemd to run the scheduler as a deamon, however, it's totally quiet with no obvious source of the error.

有什么想法吗?

推荐答案

文档可能已过时?

我通常以以下

airflow kerberos -D
airflow scheduler -D
airflow webserver -D

以下是 airflow webeserver --help 输出(从版本1.8开始):

Here's airflow webeserver --help output (from version 1.8):


-D,--daemon守护进程而不是在前台运行

-D, --daemon Daemonize instead of running in the foreground

注意那里没有布尔标志。文档必须得到修复。

Notice there is not boolean flag possible there. Documentation has to be fixed.

气流调度器-D 失败的情况下的快速提示:

Quick note in case airflow scheduler -D fails:

这已包含在注释中,但似乎值得在此提及。当您运行气流调度程序时,它将创建文件 $ AIRFLOW_HOME / airflow-scheduler.pid 。如果您尝试重新运行airflow scheduler守护进程,则几乎可以肯定会生成文件 $ AIRFLOW_HOME / airflow-scheduler.err ,该文件将告诉您 lockfile.AlreadyLocked:/home/ubuntu/airflow/airflow-scheduler.pid已被锁定。如果您的调度程序守护程序确实不可用,并且您发现需要重新启动,请执行以下命令:

This is included in the comments, but it seems like it's worth mentioning here. When you run your airflow scheduler it will create the file $AIRFLOW_HOME/airflow-scheduler.pid. If you try to re-run the airflow scheduler daemon process this will almost certainly produce the file $AIRFLOW_HOME/airflow-scheduler.err which will tell you that lockfile.AlreadyLocked: /home/ubuntu/airflow/airflow-scheduler.pid is already locked. If your scheduler daemon is indeed out of commission and you find yourself needing to restart is execute the following commands:

sudo rm $AIRFLOW_HOME airflow-scheduler.err  airflow-scheduler.pid
airflow scheduler -D 

这让我调度程序回到正轨。

This got my scheduler back on track.

这篇关于将气流调度程序作为守护进程运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆