Airflow 从私人谷歌容器存储库中拉取 docker 镜像 [英] Airflow pull docker image from private google container repository

查看:26
本文介绍了Airflow 从私人谷歌容器存储库中拉取 docker 镜像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用

以下是运行我的任务的日志:

[2019-11-16 20:20:46,874] {base_task_runner.py:110} 信息 - 作业 443:子任务 docker_op_tester [2019-11-16 20:20:46,874] {dagbag.py:88信息 - 从/Users/r7/OSS/airflow/airflow/example_dags/example_docker_operator.py 填写 DagBag[2019-11-16 20:20:47,054] {base_task_runner.py:110} 信息 - 作业 443:子任务 docker_op_tester [2019-11-16 20:20:47,054] {cli.py:592} 信息 - 运行 <任务实例:docker_sample.docker_op_tester 2019-11-14T00:00:00+00:00 [正在运行]>在主机 1.0.0.127.in-addr.arpa 上[2019-11-16 20:20:47,074] {logging_mixin.py:89} INFO - [2019-11-16 20:20:47,074] {local_task_job.py:120} 警告 - 自上次心跳以来的时间(0.01 秒)<心率(5.0 秒),睡眠时间为 4.989537 秒[2019-11-16 20:20:47,088] {logging_mixin.py:89} INFO - [2019-11-16 20:20:47,088] {base_hook.py:89} INFO - 使用连接到:id:google_con.主机:gcr.io/,端口:无,架构:,登录:_json_key,密码:XXXXXXX,额外:{}[2019-11-16 20:20:48,404] {docker_operator.py:209} 信息 - 从图像高山启动 docker 容器[2019-11-16 20:20:52,066] {logging_mixin.py:89} INFO - [2019-11-16 20:20:52,066] {local_task_job.py:99} INFO - 任务退出,返回代码为 0

I am using the https://github.com/puckel/docker-airflow image to run Airflow. I had to add pip install docker in order for it to support DockerOperator.

Everything seems ok, but I can't figure out how to pull an image from a private google docker container repository.

I tried adding the connection in the admin section type of google cloud conenction and running the docker operator as.

    t2 = DockerOperator(
            task_id='docker_command',
            image='eu.gcr.io/project/image',
            api_version='2.3',
            auto_remove=True,
            command="/bin/sleep 30",
            docker_url="unix://var/run/docker.sock",
            network_mode="bridge",
            docker_conn_id="google_con"
    )

But always get an error...

[2019-11-05 14:12:51,162] {{taskinstance.py:1047}} ERROR - No Docker registry URL provided

I also tried the docker_conf_option

    t2 = DockerOperator(
            task_id='docker_command',
            image='eu.gcr.io/project/image',
            api_version='2.3',
            auto_remove=True,
            command="/bin/sleep 30",
            docker_url="unix://var/run/docker.sock",
            network_mode="bridge",
            dockercfg_path="/usr/local/airflow/config.json",

    )

I get the following error:

[2019-11-06 13:59:40,522] {{docker_operator.py:194}} INFO - Starting docker container from image eu.gcr.io/project/image [2019-11-06 13:59:40,524] {{taskinstance.py:1047}} ERROR - ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

I also tried using only dockercfg_path="config.json" and got the same error.

I can't really use Bash Operator to try to docker login as it does not recognize docker command...

What am I missing?

line 1: docker: command not found

t3 = BashOperator(
                task_id='print_hello',
                bash_command='docker login -u _json_key - p /usr/local/airflow/config.json eu.gcr.io'
        )

解决方案

airflow.hooks.docker_hook.DockerHook is using docker_default connection where one isn't configured.

Now in your first attempt, you set google_con for docker_conn_id and the error thrown is showing that host (i.e registry name) isn't configured.

Here are a couple of changes to do:

  • image argument passed in DockerOperator should be set to image tag without registry name prefixing it.

DockerOperator(api_version='1.21',
    # docker_url='tcp://localhost:2375', #Set your docker URL
    command='/bin/ls',
    image='image',
    network_mode='bridge',
    task_id='docker_op_tester',
    docker_conn_id='google_con',
    dag=dag,
    # added this to map to host path in MacOS
    host_tmp_dir='/tmp', 
    tmp_dir='/tmp',
    )

  • provide registry name, username and password for the underlying DockerHook to authenticate to Docker in your google_con connection.

You can obtain long lived credentials for authentication from a service account key. For username, use _json_key and in password field paste in the contents of the json key file.

Here are logs from running my task:

[2019-11-16 20:20:46,874] {base_task_runner.py:110} INFO - Job 443: Subtask docker_op_tester [2019-11-16 20:20:46,874] {dagbag.py:88} INFO - Filling up the DagBag from /Users/r7/OSS/airflow/airflow/example_dags/example_docker_operator.py
[2019-11-16 20:20:47,054] {base_task_runner.py:110} INFO - Job 443: Subtask docker_op_tester [2019-11-16 20:20:47,054] {cli.py:592} INFO - Running <TaskInstance: docker_sample.docker_op_tester 2019-11-14T00:00:00+00:00 [running]> on host 1.0.0.127.in-addr.arpa
[2019-11-16 20:20:47,074] {logging_mixin.py:89} INFO - [2019-11-16 20:20:47,074] {local_task_job.py:120} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.989537 s
[2019-11-16 20:20:47,088] {logging_mixin.py:89} INFO - [2019-11-16 20:20:47,088] {base_hook.py:89} INFO - Using connection to: id: google_con. Host: gcr.io/<redacted-project-id>, Port: None, Schema: , Login: _json_key, Password: XXXXXXXX, extra: {}
[2019-11-16 20:20:48,404] {docker_operator.py:209} INFO - Starting docker container from image alpine
[2019-11-16 20:20:52,066] {logging_mixin.py:89} INFO - [2019-11-16 20:20:52,066] {local_task_job.py:99} INFO - Task exited with return code 0

这篇关于Airflow 从私人谷歌容器存储库中拉取 docker 镜像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆