为什么Airflow在不重命名dag的情况下更改start_date? [英] Why does Airflow changing start_date without renaming dag?

查看:537
本文介绍了为什么Airflow在不重命名dag的情况下更改start_date?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一名数据工程师,并定期处理气流。

I am a data engineer and work with airflow regularly.

当使用新的开始日期重新部署dag时,最佳实践如此处

When redeploying dags with a new start date the best practice is as shown in the here:


请勿更改开始日期+间隔:运行DAG后,调度程序数据库将包含该DAG运行的实例。如果更改start_date或时间间隔并重新部署它,则调度程序可能会感到困惑,因为时间间隔不同或start_date倒退了。解决此问题的最佳方法是,只要更改start_date或时间间隔,即my_dag_v1和my_dag_v1,就应立即更改DAG的版本。这样,历史信息也会保留在旧版本中。

Don’t change start_date + interval: When a DAG has been run, the scheduler database contains instances of the run of that DAG. If you change the start_date or the interval and redeploy it, the scheduler may get confused because the intervals are different or the start_date is way back. The best way to deal with this is to change the version of the DAG as soon as you change the start_date or interval, i.e. my_dag_v1 and my_dag_v1. This way, historical information is also kept about the old version.

但是,删除所有先前的DAG和任务运行后,我尝试重新部署设置新的开始日期。它工作了一天(使用新的开始日期),然后又开始使用旧的

However after deleting all previous DAG and task runs I tried to redeploy a dag with a new start date. It worked as expected (with the new start date) for a day, then started to work with the old again

这是什么原因?

推荐答案

Airflow在表中维护有关过去运行的所有信息。 dag_run

Airflow maintains all of the information regarding the past runs in a table dag_run.

清除先前的dag运行时,将从数据库中删除这些条目。因此,airflow将此dag视为新的dag,并在指定的时间开始。

When you clear the previous dag runs, these entries are dropped from the database. Hence, airflow treats this dag as a new dag and starts at the specified time.

Airflow检查最后一次dag的执行时间( start_date 最后一次运行),并添加您在 schedule_interval 中指定的 timedelta 对象。

Airflow checks the last dag execution time (start_date of last run) and adds the timedelta object which you have specified in schedule_interval.

如果即使在清除dag运行后仍遇到困难,则可以执行以下操作:

If you are having difficulties even after clearing dag runs, few things you can do:



  1. 清除所有dag运行,保持dag暂停。创建一个dag运行,然后打开dag。

  2. 最好的方法是在 schedule_interval

  1. Rename the dag as suggested.
  2. Clear all the dag runs, keep the dag paused. Create a dag run and then turn the dag on. It will run on the scheduled time afterwards.
  3. The best approach would be to use crontab expression inside schedule_interval.

这篇关于为什么Airflow在不重命名dag的情况下更改start_date?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆