气流+芹菜或凉拌。什么时候? [英] Airflow + celery or dask. For what, when?

查看:86
本文介绍了气流+芹菜或凉拌。什么时候?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我阅读了官方Airflow文档



这到底是什么意思?作者向外扩展意味着什么?也就是说,何时不足以使用Airflow或何时将Airflow与Celery之类的东西结合使用? (与 dask 相同)

解决方案

在Airflow术语中是执行者是负责运行任务的组件。 LocalExecutor 通过在运行Airflow的计算机上生成线程来执行此操作,并让该线程执行任务。



自然然后,您的容量将受到本地计算机上可用资源的限制。 CeleryExecutor 将负载分配到多台计算机。执行程序本身将执行任务的请求发布到队列中,几个工作节点之一拾取并执行该请求。现在,您可以扩展工作节点的集群以增加整体容量。



最后,还没有准备好,有一个 KubernetesExecutor 链接)。这将在Kubernetes集群上运行任务。由于任务在容器中运行,因此不仅可以完全隔离任务,还可以利用Kubernetes中的现有功能来自动扩展群集,以便始终拥有最佳数量的可用资源。


I read in the official Airflow documentation the following:

What does this mean exactly? What do the authors mean by scaling out? That is, when is it not enough to use Airflow or when would anyone use Airflow in combination with something like Celery? (same for dask)

解决方案

In Airflow terminology an "Executor" is the component responsible for running your task. The LocalExecutor does this by spawning threads on the computer Airflow runs on and lets the thread execute the task.

Naturally your capacity is then limited by the available resources on the local machine. The CeleryExecutor distributes the load to several machines. The executor itself publishes a request to execute a task to a queue, and one of several worker nodes picks up the request and executes it. You can now scale the cluster of worker nodes to increase overall capacity.

Finally, and not ready yet, there's a KubernetesExecutor in the works (link). This will run tasks on a Kubernetes cluster. This will not only give your tasks complete isolation since they're run in containers, you can also leverage the existing capabilities in Kubernetes to for instance auto scale your cluster so that you always have an optimal amount of resources available.

这篇关于气流+芹菜或凉拌。什么时候?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆