保证一些算子会在同一个airflow worker上执行 [英] Guarantee that some operators will be executed on the same airflow worker

查看:41
本文介绍了保证一些算子会在同一个airflow worker上执行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 DAG

  1. 从云存储下载 csv 文件
  2. 通过 https 将 csv 文件上传到第三方

我正在执行的气流集群默认使用 CeleryExecutor,所以我担心在我扩大工作人员数量的某个时候,这些任务可能会在不同的工作人员上执行.例如.工作人员 A 进行下载,工作人员 B 尝试上传,但没有找到文件(因为它在工作人员 A 上)

The airflow cluster I am executing on uses CeleryExecutor by default, so I'm worried that at some point when I scale up the number of workers, these tasks may be executed on different workers. eg. worker A does the download, worker B tries to upload, but doesn't find the file (because it's on worker A)

是否有可能以某种方式保证下载和上传操作符都将在同一个气流工作器上执行?

Is it possible to somehow guarantee that both the download and upload operators will be executed on the same airflow worker?

推荐答案

对于这些类型的用例,我们有两种解决方案:

For these kinds of use cases we have two solutions:

  1. 使用在两者之间共享的网络安装驱动器工作人员,以便下载和上传任务都可以访问到同一个文件系统
  2. 使用 Airflow queue 是工作人员特定的.如果只有一个工作人员在侦听此队列,您将保证两个人都可以访问同一个文件系统.请注意,每个工作器都可以侦听多个队列,因此您可以让它侦听默认"队列以及用于此任务的自定义队列.
  1. Use a network mounted drive that is shared between the two workers so that both the downloading and uploading tasks have access to the same file system
  2. Use Airflow queue that is worker specific. If there is only one worker listening to this queue you will guarantee that both will have access to the same file system. Note that each worker can listen on multiple queues so you can have it listening on the "default" queue as well as the custom one intended for this task.

这篇关于保证一些算子会在同一个airflow worker上执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆