气流设置可实现高可用性 [英] Airflow setup for high availability

查看:79
本文介绍了气流设置可实现高可用性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何以高可用性部署Apache气流(通常称为airbnb的气流)调度程序?

How to deploy apache airflow (formally known as airbnb's airflow) scheduler in high availability?

我不是在问应该使用的后端数据库或RabbitMQ显然是在高可用性配置中部署的。

我的主要重点是调度程序-是否需要做一些特殊的事情?

My main focus is the scheduler - is there something special needs to be done?

推荐答案

经过一番挖掘,我发现同时运行多个调度程序并不安全,这意味着开箱即用-气流调度程序不安全使用在高可用性环境中。

After a bit digging I found that it is not safe to run multiple schedulers simoultanously, this means that out of the box - airflow schedulers are not safe to use in high availablity environments.

气流团队正计划通过在DAG数据结构上添加锁定机制来解决此问题,但这尚未实现(我检查了运行2个调度程序,发现它们调度相同的dag实例,这不好)。
此处描述:
https:// groups .google.com / forum /#!topic / airbnb_airflow / -1wKa3OcwME

The airflow team are planning to solve this issue by adding a lock mechanism on the DAG data structure, but this is not implemented yet (I checked by running 2 schedulers and saw that they schedule the same dag instances which is not good). This is described here: https://groups.google.com/forum/#!topic/airbnb_airflow/-1wKa3OcwME

我确实找到了一种解决方法,可以通过将调度程序与我自己的代码,并使用集群工具进行领导者选举(我为此目的亲自使用领事)。这样,只有当选的主机正在运行调度程序,而当主机关闭时,从机将替换它。

I did found a way to workaround this high availalbilty issue by wrapping the schedulers with my own code and use cluster tools for leader election (I personanlly use consul for this purpose). This way only the elected master is running the scheduler and when the master is down the slave replaces him.

当您在高可用性环境中使用气流时,请考虑这一点,因为在框中,气流调度程序当前不适合此操作(除非您自己解决此问题)。

Please consider this when u use airflow in high availabilty environments since out of the box, airflow scheduler is currently not suitable for this (unless you solve this issue yourself).

编辑-主从解决方案的另一种方法是使用群集经理/调度程序,以确保始终只有一个气流调度程序实例可用。这种方法依赖于您拥有的集群管理器的自我修复能力。例如,mesos和nomad都支持这种配置(我为简单起见选择了nomad)。

Edit - an alternative approach to the master slave solution is to use a cluster manager/scheduler to make sure that only one airflow scheduler instance is always available. This approach relies on the self healing abilities of the cluster manager u have. For example both mesos and nomad supports this kind of configuration (I presonally chose nomad for its simplicity).

这篇关于气流设置可实现高可用性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆