为什么不建议在Airflow中使用动态start_date? [英] Why is it recommended against using a dynamic start_date in Airflow?

查看:955
本文介绍了为什么不建议在Airflow中使用动态start_date?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已阅读有关 开始日期有什么用? ,但是我仍然不清楚为什么不建议使用动态开始日期

I've read Airflow's FAQ about "What's the deal with start_date?", but it still isn't clear to me why it is recommended against using dynamic start_date.

据我了解,DAG的执行日期由最小的<$ DAG的所有任务之间的c $ c> start_date ,以及随后的DAG运行在最新的 execution_date + schedule_interval运行

To my understanding, a DAG's execution_date is determined by the minimum start_date between all of the DAG's tasks, and subsequent DAG Runs are ran at the latest execution_date + schedule_interval.

如果我设置了DAG的 default_args 开始日期表示昨天在 20:00:00 ,其中 schedule_interval 为1天的时间,如果有的话,它将如何破坏或混淆调度程序?如果我理解正确,则调度程序将在 20:00:00 的昨天 execution_date 触发DAG,并且下一次DAG运行将于今天在 20:00:00 进行。

If I set my DAG's default_args start_date to be for, say, yesterday at 20:00:00, with a schedule_interval of 1 day, how would that break or confuse the scheduler, if at all? If I understand correctly, the scheduler would trigger the DAG with an execution_date of yesterday at 20:00:00, and the next DAG Run would be scheduled for today at 20:00:00.

我是否有一些概念

推荐答案

第一次运行是在 start_date + schedule_interval 。它不会在开始日期上运行dag,而总是在 start_date + schedule_interval 上运行。

First run would be at start_date+schedule_interval. It doesn't run dag on start_date, it always runs on start_date+schedule_interval.

如文件中所述,如果您给 start_date 动态例如 datetime.now()并给出一些 schedule_interval (1小时),它将永远不会执行 now()与时间一起移动, datetime.now()+ 1小时不可能

As they mentioned in document if you give start_date dynamic for e.g. datetime.now() and give some schedule_interval(1 hour), it will never execute that run as now() moves along with time and datetime.now()+ 1 hour is not possible

这篇关于为什么不建议在Airflow中使用动态start_date?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆