气流:从新时间表开始重新运行DAG [英] Airflow: Re-run DAG from beginning with new schedule

查看:79
本文介绍了气流:从新时间表开始重新运行DAG的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景故事:我每天都在运行Airflow作业,开始日期为2019年7月1日。收集的该作业要求第三方提供每天的数据,然后将数据加载到我们的数据库中。

Backstory: I was running an Airflow job on a daily schedule, with a start_date of July 1, 2019. The job gathered requested each day's data from a third party, then loaded that data into our database.

成功运行了几天后,我意识到第三方数据源每月仅刷新一次数据。因此,我每天只是下载相同的数据。

After running the job successfully for several days, I realized that the third party data source only refreshed their data once a month. As such, I was simply downloading the same data every day.

那时,我将开始日期更改为一年前(以获取前几个月的信息),并且更改了DAG的计划,使其每月运行一次。

At that point, I changed the start_date to a year ago (to get previous months' info), and changed the DAG's schedule to run once a month.

我如何(在用户界面中)完全重新启动DAG,以使其能够识别我的新开始日期并安排时间,并像DAG是全新的那样运行一个完整的回填?

How do I (in the airflow UI) restart the DAG completely, such that it recognizes my new start_date and schedule, and runs a complete backfill as if the DAG is brand new?

(我知道可以通过

推荐答案

单击。但是,我没有命令行界面的权限,并且管理员无法访问。 Web界面中问题运行列中的绿色圆圈。

Click on the green circle in the Dag Runs column for the job in question in the web interface. This will bring you to a list of all successful runs.

勾选列表标题左上角的复选标记,以选择所有
实例,然后在上方菜单中选择已选中,然后在下拉菜单中选择删除。这应该清除所有现有的dag运行实例。

Tick the check mark on the top left in the header of the list to select all instances, then in the menu above it choose "With selected" and then "Delete" in the drop down menu. This should clear all existing dag run instances.

如果未在Airflow实例上启用catchup_by_default,请确保 catchup = True 设置在DAG上,直到完全完成。

If catchup_by_default is not enabled on your Airflow instance, make sure catchup=True is set on the DAG until it has finished catching up.

这篇关于气流:从新时间表开始重新运行DAG的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆