如何使用来自 Apache Airflow 的 Docker Operator 的卷 [英] how to use volume with Docker Operator from Apache Airflow

查看:30
本文介绍了如何使用来自 Apache Airflow 的 Docker Operator 的卷的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个 ETL 过程,以便使用 DockerOperator 与 Apache Airflow 进行调度和编排.我在 Windows 笔记本电脑上工作,所以我只能从 docker 容器内运行 Apache Airflow.我能够使用以下 docker-compose.yml 中指定的卷将带有配置文件(下面称为 configs)的文件夹安装到气流容器(下面称为 webserver)中 文件驻留在我的项目根目录中.docker-compose.yml 文件中的相关代码如下:

I am developing an ETL process to be scheduled and orchestrated with Apache Airflow using the DockerOperator. I am working on a Windows Laptop, so I can only run Apache Airflow from inside a docker container. I was able to mount a folder on my windows laptop with config files (called configs below) into the airflow container (named webserver below) using a volume specified in the below docker-compose.yml file residing in my project root directory. The relevant code from the docker-compose.yml file can be seen below:

version: '2.1'
    webserver:
        build: ./docker-airflow
        restart: always
        privileged: true
        depends_on:
            - mongo
            - mongo-express
        environment:
            - LOAD_EX=n
            - EXECUTOR=Local
        volumes:
            - ./docker-airflow/dags:/usr/local/airflow/dags
            # Volume for source code
            - ./src:/src
            - ./docker-airflow/workdir:/home/workdir
            # configs folder as volume
            - ./configs:/configs
            # Mount the docker socket from the host (currently my laptop) into the webserver container so that the webserver container can create "sibbling" containers
            - //var/run/docker.sock:/var/run/docker.sock  # the two "//" are needed for windows OS
        ports:
            - 8081:8080
        command: webserver
        healthcheck:
            test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
            interval: 30s
            timeout: 30s
            retries: 3
        networks:
            - mynet

现在我想将这个 configs 文件夹及其所有内容传递到由 DockerOperator 创建的容器.尽管此 configs 文件夹显然已挂载到网络服务器容器的文件系统中,但此 configs 文件夹完全为空,因此,我的 DAG 失败.DockerOperator 的代码如下:

Now I want to pass this configs folder with all its content on to the containers which are created by the DockerOperator. Although this configs folder was apparently mounted into the webserver container's file system, this configs folder is completely empty and because of that, my DAG fails. The code for the DockerOperator is as follows:

cmd = "--config_filepath {} --data_object_name {}".format("/configs/dev.ini", some_data_object)
        staging_op = DockerOperator(
            command=cmd,
            task_id="my_task",
            image="{}/{}:{}".format(docker_hub_username, docker_hub_repo_name, image_name),
            api_version="auto",
            auto_remove=False,
            network_mode=docker_network,
            force_pull=True,
            volumes=["/configs:/configs"]  # "absolute_path_host:absolute_path_container"
        )

根据文档,卷的左侧必须是主机上的绝对路径,(如果我理解正确的话)在这种情况下是网络服务器容器(因为它为每个任务创建单独的容器).卷的右侧是由 DockerOperator 创建的任务容器内的目录.如上所述,任务容器内的 configs 文件夹确实存在,但完全是空的.有谁知道为什么会这样以及如何解决它?

According to the documentation, the left side of the volume must be an absolute path on the host, which (if I understood correctly) is the webserver container in this case (because it creates separate containers for every task). The right side of the volume is a directory inside the task's container which is created by the DockerOperator. As mentioned above, the configs folder inside the task's container does exist, but is completely empty. Does anyone know why this is the case and how to fix it?

非常感谢您的帮助!

推荐答案

在实施来自 此处,DockerOperator的构造函数中的volumes需要指定如下:

After implemententing the suggestions from here, the volumes in the constructor of the DockerOperator need to be specified as follows:

cmd = "--config_filepath {} --data_object_name {}".format("/configs/dev.ini", some_data_object)
        staging_op = DockerOperator(
            command=cmd,
            task_id="my_task",
            image="{}/{}:{}".format(docker_hub_username, docker_hub_repo_name, image_name),
            api_version="auto",
            auto_remove=False,
            network_mode=docker_network,
            force_pull=True,
            volumes=['/c/Users/kevin/dev/myproject/app/configs:/app/configs']  # "absolute_path_host:absolute_path_container"
        )

也许文件路径需要像这样,因为 Docker 在 Windows 上的虚拟机中运行?

Maybe the file paths need to look like that, because Docker runs inside a VM on Windows?

正如@sarnu 还提到的,重要的是要了解主机端路径是我的 Windows 笔记本电脑上的路径,因为为每个任务创建的容器并行运行/是气流容器的同级容器.

As @sarnu also mentioned, it is important to understand, that the host-side paths are paths on my windows laptop, because the containers created for each task run in parallel / are sibbling containers to the airflow container.

这篇关于如何使用来自 Apache Airflow 的 Docker Operator 的卷的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆