气流平行度 [英] Airflow parallelism

查看:88
本文介绍了气流平行度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本地执行器在计划任务时产生新的进程。它创建的进程数是否有限制?我需要改变它。我需要知道airflow.cfg中调度程序的 max_threads和
parallelism有什么区别?

the Local Executor spawns new processes while scheduling tasks. Is there a limit to the number of processes it creates. I needed to change it. I need to know what is the difference between scheduler's "max_threads" and "parallelism" in airflow.cfg ?

推荐答案

并行性:不是一个描述性很强的名称。描述说它设置了气流安装的最大任务实例,这有点模棱两可–如果我有两个主机在运行气流工作程序,则我将气流安装在两个主机上,因此应该是两个安装,但要根据上下文每个安装在这里表示每个气流状态数据库。我将其命名为max_active_tasks。

parallelism: not a very descriptive name. The description says it sets the maximum task instances for the airflow installation, which is a bit ambiguous — if I have two hosts running airflow workers, I'd have airflow installed on two hosts, so that should be two installations, but based on context 'per installation' here means 'per Airflow state database'. I'd name this max_active_tasks.

dag_concurrency :尽管基于注释的名称实际上是任务并发,并且是每个工作人员的。我将其命名为max_active_tasks_for_worker(per_worker会建议这是工人的全局设置,但我认为您可以为此设置不同的值)。

dag_concurrency: Despite the name based on the comment this is actually the task concurrency, and it's per worker. I'd name this max_active_tasks_for_worker (per_worker would suggest that it's a global setting for workers, but I think you can have workers with different values set for this).

max_active_runs_per_dag :不错,但是由于它似乎只是匹配的DAG kwarg的默认值,因此最好在名称中反映出来,例如default_max_active_runs_for_dags
,让我们继续前进到DAG kwargs:

max_active_runs_per_dag: This one's kinda alright, but since it seems to be just a default value for the matching DAG kwarg, it might be nice to reflect that in the name, something like default_max_active_runs_for_dags So let's move on to the DAG kwargs:

并发:同样,具有这样的通用名称,再加上并发用于其他地方的不同事实使这个变得很混乱。我将其称为max_active_tasks。

concurrency: Again, having a general name like this, coupled with the fact that concurrency is used for something different elsewhere makes this pretty confusing. I'd call this max_active_tasks.

max_active_runs :这听起来很不错。

来源: https://issues.apache.org/jira/browse/AIRFLOW- 57

max_threads 使用户可以控制cpu的使用。它指定了调度程序并行性。

max_threads gives the user some control over cpu usage. It specifies scheduler parallelism.

这篇关于气流平行度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆