当我第一次取消暂停任务时,如何停止气流运行? [英] How do i stop airflow running a task the first time when i unpause it?

查看:24
本文介绍了当我第一次取消暂停任务时,如何停止气流运行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 DAG.这是参数的示例.

dag = DAG('我的狗',default_args=default_args,description='Cron Job : My Dag',schedule_interval='45 07 * * *',# start_date=days_ago(0),start_date = 日期时间(2021, 4, 6, 10, 45),tags=['我的 Dag 标签'],并发 = 1,is_paused_upon_creation=真,catchup=False # 不要运行前一个和回填;只运行最新的)

从 AIRFLOW 阅读文档,我想我已将 dag 设置为每天 7:45 运行.但是,如果我暂停 dag 并在几天后取消暂停,它仍然会在我取消暂停后立即运行(当然是那天),因为 catch=False 可以避免回填.这不是预期的行为吗?我的意思是我把它安排在 7:45.当我在 10:00 取消暂停时,它在下一个 7:45 之前根本不应该运行.

我在这里遗漏了什么?

解决方案

我假设你熟悉 Airflow 的调度机制,如果不是这种情况请阅读

DAG start_date2020-01-01 带有 catchup=False 我今天部署了 DAG(19/Apr/2021) 所以它创建了一个运行时间为 execution_date='2021-04-18' 的运行,今天开始运行 2021-04-19.

I have a DAG. Here is a sample of the parameters.

dag = DAG(
    'My Dag',
    default_args=default_args,
    description='Cron Job : My Dag',
    schedule_interval='45 07 * * *',
    # start_date=days_ago(0),
    start_date = datetime(2021, 4, 6, 10, 45),
    tags=['My Dag Tag'],
    concurrency = 1,
    is_paused_upon_creation=True,
    catchup=False # dont run previous and backfill; run only latest
)

Reading the documentation from AIRFLOW, i think i have set the dag to run at 7:45 everyday. However if I pause the dag and unpause it a couple of days later, it still runs as soon as I unpause it (of course for that day) as catch=False which avoids backfills. That is not the expected behaviour right? I mean I scheduled it on 7:45. When I unpause it at 10:00 it should not be running at all until the next 7:45.

What am i missing here?

解决方案

I assume that you are familiar with the scheduling mechanism of Airflow, if this is not the case please read Problem with start date and scheduled date in Apache airflow before reading the rest of the answer.

As for your case: You had one/several runs as expected when you deployed the dag. At some point you paused the dag on 2021-04-07, today (2021-04-19) you unpaused it. Airflow then executed a dag run with execution_date='2021-04-18'.

This is expected.

The reason for this is based on the scheduling mechanism of Airflow. Your last run was on 2021-04-07 the interval is 45 07 * * * (every day at 07:45). Since you paused the DAG the runs of 2021-04-08, 2021-04-09, ... , 2021-04-17 were never created. When you unpaused the DAG Airflow didn't create these runs because of catchup=False however today run (2021-04-19) isn't part of the catchup it was scheduled because the interval of execution_date=2021-04-18 has reached its end cycle thus started running.

The behavior that you are experiencing isn't different than deploying this fresh DAG:

from airflow.operators.dummy_operator import DummyOperator
default_args = {
    'owner': 'airflow',
    'start_date': datetime(2020, 1, 1),

}
with DAG(dag_id='stackoverflow_question',
         default_args=default_args,
         schedule_interval='45 07 * * *',
         catchup=False
         ) as dag:
    DummyOperator(task_id='some_task')

As soon as you will deploy it a single run will be created:

The DAG start_date is 2020-01-01 with catchup=False I deployed the DAG today (19/Apr/2021)so it created a run with execution_date='2021-04-18' that started to run today 2021-04-19.

这篇关于当我第一次取消暂停任务时,如何停止气流运行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆