Apache Airflow计划程序不会在计划时间触发DAG [英] Apache Airflow scheduler does not trigger DAG at schedule time

查看:275
本文介绍了Apache Airflow计划程序不会在计划时间触发DAG的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我安排DAG在每天的特定时间运行时,DAG的执行根本不会发生。
但是,当我重新启动Airflow Web服务器和调度程序时,DAG在该特定日期的预定时间执行一次,并且从第二天起不执行。
我正在使用Airflow版本v1.7.1.3和python 2.7.6。
DAG代码如下:

When I schedule DAGs to run at a specific time everyday, the DAG execution does not take place at all. However, when I restart Airflow webserver and scheduler, the DAGs execute once on the scheduled time for that particular day and do not execute from the next day onwards. I am using Airflow version v1.7.1.3 with python 2.7.6. Here goes the DAG code:

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta

import time
n=time.strftime("%Y,%m,%d")
v=datetime.strptime(n,"%Y,%m,%d")
default_args = {
    'owner': 'airflow',
    'depends_on_past': True,
    'start_date': v,
    'email': ['airflow@airflow.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=10),

}

dag = DAG('dag_user_answer_attempts', default_args=default_args, schedule_interval='03 02 * * *')

# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(
    task_id='user_answer_attempts',
    bash_command='python /home/ubuntu/bigcrons/appengine-flask-skeleton-master/useranswerattemptsgen.py',
    dag=dag)

我做错什么了吗?

推荐答案

您的问题是开始日期设置为当前时间。气流在间隔的结束而不是开始的时间内运行作业。这意味着您的工作将在第一个间隔之后进行。

Your issue is the start_date being set for the current time. Airflow runs jobs at the end of an interval, not the beginning. This means that the first run of your job is going to be after the first interval.

示例:

您创建了一个dag,并在午夜将其放到Airflow中。今天(20XX-01-01 00:00:00)也是开始日期,但是它是硬编码的(开始日期:datetime(20XX,1,1) )。计划时间间隔是每天的,就像您的时间间隔一样( 3 2 * * * )。

You make a dag and put it live in Airflow at midnight. Today (20XX-01-01 00:00:00) is also the start_date, but it is hard-coded ("start_date":datetime(20XX,1,1)). The schedule interval is daily, like yours (3 2 * * *).

第一次dag将被排队执行的时间是20XX-01-02 02:03:00,因为那是间隔时间结束。如果您查看当时正在运行的dag,它的开始日期时间应该是schedule_date之后的大约一天。

The first time this dag will be queued for execution is 20XX-01-02 02:03:00, because that is when the interval period ends. If you look at your dag being run at that time, it should have a started datetime of roughly one day after the schedule_date.

您可以通过将<$通过确保动态日期过去比执行时间间隔更远,将c $ c>开始日期硬编码为日期会足够)。 Airflow建议您在需要重新运行作业或回填(或结束dag)的情况下使用静态开始日期。

You can solve this by having your start_date hard-coded to a date or by making sure that the dynamic date is further in the past than the interval between executions (In your case, 2 days would be plenty). Airflow recommends you use static start_dates in case you need to re-run jobs or backfill (or end a dag).

有关回填的更多信息(本信息的反面)常见的stackoverflow问题),请检查文档或以下问题:
Airflow无法正确调度Python

For more information on backfilling (the opposite side of this common stackoverflow question), check the docs or this question: Airflow not scheduling Correctly Python

这篇关于Apache Airflow计划程序不会在计划时间触发DAG的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆