AIrflow-在多个文件中拆分DAG定义 [英] AIrflow - Splitting DAG definition across multiple files

查看:446
本文介绍了AIrflow-在多个文件中拆分DAG定义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

刚开始使用Airflow并想知道构建大型DAG的最佳实践是什么。对于我们的ETL,我们有很多属于逻辑分组的任务,但是这些分组是相互依赖的。以下哪项将被视为最佳做法?

Just getting started with Airflow and wondering what best practices are for structuring large DAGs. For our ETL, we have a lots of tasks that fall into logical groupings, yet the groups are dependent on each other. Which of the following would be considered best practice?


  • 一个DAG大文件,其中所有任务都在文件中

  • 在多个文件中拆分DAG定义(如何执行此操作?)

  • 定义多个DAG,每组任务一个,并使用ExternalTask​​Sensor设置它们之间的依赖关系

也欢迎其他建议。

推荐答案

DAG只是python文件。因此,您可以将一个dag定义拆分为多个文件。不同的文件应该只具有接收dag对象并使用该dag对象创建任务的方法。

DAGs are just python files. So you could split a single dag definition into multiple files. The different files should just have methods that take in a dag object and create tasks using that dag object.

不过请注意,您应该在全局范围内只使用一个dag对象。气流将全局范围内的所有dag对象作为单独的dag拾取。

Note though, you should just a single dag object in the global scope. Airflow picks up all dag objects in the global scope as separate dags.

通常最好的做法是使每个dag尽可能简洁。但是,如果您需要设置此类依赖项,则可以考虑使用subdag。此处的更多信息: https://airflow.incubator.apache.org/ concept.html?highlight = subdag#scope

It is often considered good practice to keep each dag as concise as possible. However if you need to set up such dependencies you could either consider using subdags. More about this here: https://airflow.incubator.apache.org/concepts.html?highlight=subdag#scope

您也可以使用ExternalTask​​Sensor,但要注意,随着dag数量的增加,处理外部变量可能会变得更加困难任务之间的依赖关系。我认为subdags可能是使用案例的方式。

You could also use ExternalTaskSensor but beware that as the number of dags grow, it might get harder to handle external dependencies between tasks. I think subdags might be the way to go for your use case.

这篇关于AIrflow-在多个文件中拆分DAG定义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆