Apache Airflow 调度程序不会在预定时间触发 DAG [英] Apache Airflow scheduler does not trigger DAG at schedule time

查看:32
本文介绍了Apache Airflow 调度程序不会在预定时间触发 DAG的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我安排 DAG 在每天的特定时间运行时,DAG 根本不会执行.但是,当我重新启动 Airflow 网络服务器和调度程序时,DAG 会在该特定日期的预定时间执行一次,并且不会从第二天开始执行.我正在使用带有 python 2.7.6 的 Airflow 版本 v1.7.1.3.这是 DAG 代码:

When I schedule DAGs to run at a specific time everyday, the DAG execution does not take place at all. However, when I restart Airflow webserver and scheduler, the DAGs execute once on the scheduled time for that particular day and do not execute from the next day onwards. I am using Airflow version v1.7.1.3 with python 2.7.6. Here goes the DAG code:

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta

import time
n=time.strftime("%Y,%m,%d")
v=datetime.strptime(n,"%Y,%m,%d")
default_args = {
    'owner': 'airflow',
    'depends_on_past': True,
    'start_date': v,
    'email': ['airflow@airflow.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=10),

}

dag = DAG('dag_user_answer_attempts', default_args=default_args, schedule_interval='03 02 * * *')

# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(
    task_id='user_answer_attempts',
    bash_command='python /home/ubuntu/bigcrons/appengine-flask-skeleton-master/useranswerattemptsgen.py',
    dag=dag)

我做错了吗?

推荐答案

您的问题是 start_date 设置为当前时间.Airflow 在间隔的结束运行作业,而不是开始.这意味着您的作业的第一次运行将在第一个间隔之后.

Your issue is the start_date being set for the current time. Airflow runs jobs at the end of an interval, not the beginning. This means that the first run of your job is going to be after the first interval.

示例:

你制作了一个 dag 并在午夜将它放在 Airflow 中.今天 (20XX-01-01 00:00:00) 也是 start_date,但它是硬编码的 ("start_date":datetime(20XX,1,1)).计划间隔是每天,就像你的一样 (3 2 * * *).

You make a dag and put it live in Airflow at midnight. Today (20XX-01-01 00:00:00) is also the start_date, but it is hard-coded ("start_date":datetime(20XX,1,1)). The schedule interval is daily, like yours (3 2 * * *).

此 dag 首次排队执行的时间是 20XX-01-02 02:03:00,因为那是间隔期结束的时间.如果您查看当时正在运行的 dag,它的开始日期时间应该在 schedule_date 之后大约一天.

The first time this dag will be queued for execution is 20XX-01-02 02:03:00, because that is when the interval period ends. If you look at your dag being run at that time, it should have a started datetime of roughly one day after the schedule_date.

您可以通过将 start_date 硬编码为日期来解决这个问题,确保动态日期比执行间隔更远(在您的情况下,2 天就足够了).Airflow 建议您使用静态 start_dates,以防您需要重新运行作业或回填(或结束 dag).

You can solve this by having your start_date hard-coded to a date or by making sure that the dynamic date is further in the past than the interval between executions (In your case, 2 days would be plenty). Airflow recommends you use static start_dates in case you need to re-run jobs or backfill (or end a dag).

有关回填的更多信息(这个常见 stackoverflow 问题的另一面),请查看文档或这个问题:Airflow 未正确调度 Python

For more information on backfilling (the opposite side of this common stackoverflow question), check the docs or this question: Airflow not scheduling Correctly Python

这篇关于Apache Airflow 调度程序不会在预定时间触发 DAG的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆