如何防止气流回填dag管路? [英] How to prevent airflow from backfilling dag runs?

查看:81
本文介绍了如何防止气流回填dag管路?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个气流DAG,没有必要进行回填,这意味着,在运行一次之后,迅速运行后续时间将完全没有意义。

Say you have an airflow DAG that doesn't make sense to backfill, meaning that, after it's run once, running it subsequent times quickly would be completely pointless.

例如,如果您要从仅每小时更新一次的某个源加载数据,则快速连续进行的回填将一次又一次地导入相同的数据。

For example, if you're loading data from some source that is only updated hourly into your database, backfilling, which occurs in rapid succession, would just be importing the same data again and again.

当实例化一个新的每小时任务时,这特别烦人,它错过的每个小时运行 N 次数,然后再执行多余的工作开始以您指定的时间间隔运行。

This is especially annoying when you instantiate a new hourly task, and it runs N amount of times for each hour it missed, doing redundant work, before it starts running on the interval you specified.

我能想到的唯一解决方案是他们在文档常见问题解答

The only solution I can think of is something that they specifically advised against in FAQ of the docs


我们建议您不要将动态值用作start_date,尤其是 datetime.now(),因为它可能会造成混乱。

We recommend against using dynamic values as start_date, especially datetime.now() as it can be quite confusing.

有什么方法可以禁用DAG的回填吗?还是应该执行上述操作?

Is there any way to disable backfilling for a DAG, or should I do the above?

推荐答案

升级到气流版本1.8,并在airflow.cfg中使用catchup_by_default = False或对每个dag应用catchup = False。

Upgrade to airflow version 1.8 and use catchup_by_default=False in the airflow.cfg or apply catchup=False to each of your dags.

https://github.com/apache/incubator-airflow/blob/master/UPDATING.md #catchup_by_default

这篇关于如何防止气流回填dag管路?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆