是否可以通过向Airflow中的操作员添加更多的cpus来提高处理速度? [英] Can I increase the processing speed by adding more cpus to operators in Airflow?

查看:122
本文介绍了是否可以通过向Airflow中的操作员添加更多的cpus来提高处理速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

airflow.cfg中有一个名为[operators]的部分,其中default_cpus设置为1,并且default_ramdefault_disk都设置为512.

In airflow.cfg there is a section called [operators], where default_cpus was set to 1 and default_ram and default_disk were both set to 512.

我想了解,如果不增加这些参数,是否可以提高处理速度.

I would like to understand whether will I get improvements in processing speed if I increase these parameters or not.

推荐答案

我查看了源代码,这些设置对所有操作员都可用,但是操作员和任何执行者都从未使用过它们.

I took a look at the sources and these settings are available to all operators, but they are never used, neither by operators nor by any executor.

因此,我回顾了一下历史,并回顾了引入了这些设置的提交,它们是引用JIRA票证导致该PR:

So I went a little bit back into history and had a look at the commit that introduced those settings and they are, quoting the JIRA ticket that lead to that PR:

与资源管理器(例如yarn和mesos)一起使用的可选资源要求

optional resource requirements for use with resource managers such as yarn and mesos

但是,Mesos执行程序是社区贡献,它没有利用此属性,只是

The Mesos executor, however, is a community contribution that does not leverage this properties and just assigns the same amount of resources to every task, and the YARN executor is not there yet AFAIK (as of version 1.9).

我曾经与Airflow团队进行过讨论,以了解是否有一种方法可以使用Mesos执行程序按任务分配资源,他们用

I once had a discussion with the Airflow team to understand if there was a way to assign resources on a per task basis using the Mesos executor and they replied me with their strategy to assign resources to tasks using the Celery executor, in case it may be of help to you to understand how to manage resources.

从更一般的意义上讲,您要问的核心问题是,与任务分配的资源有关,您可以从任务中获得的吞吐量类型在很大程度上取决于任务本身:当然,这非常重要.如果将多个处理器分配给多个内核,则可以利用多个处理器的计算密集型任务将看到速度的提高,而I/O密集型任务(如在不同系统之间复制数据)可能不会有太大改善.

Regarding the core question that you are asking in a more general sense, the kind of throughput that you can get out of a task in relation with the resources it gets assigned, depends a lot on the task itself: of course a very compute-intensive task that can leverage multiple processors will see speed bumps if you assign it multiple cores, while an I/O intensive task (like copying data between different systems) will probably not see much improvement.

这篇关于是否可以通过向Airflow中的操作员添加更多的cpus来提高处理速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆