在DAG中使用boto3时,Apache airflow无法找到AWS凭证 [英] Apache airflow cannot locate AWS credentials when using boto3 inside a DAG

查看:81
本文介绍了在DAG中使用boto3时,Apache airflow无法找到AWS凭证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用ECS Fargate迁移到Apache Airflow.

We are migrating to Apache Airflow using ECS Fargate.

我们面临的问题很简单.我们有一个简单的DAG,其任务之一是与AWS中的某些外部服务进行通信(例如,从S3下载文件).这是DAG的脚本:

The problem we are facing, it's simple. We have a simple DAG that one of its tasks is to communicate with some external service in AWS (let's say, download a file from S3). This is the script of the DAG:

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator

from datetime import datetime, timedelta


# default arguments for each task
default_args = {
    'owner': 'thomas',
    'depends_on_past': False,
    'start_date': datetime(2015, 6, 1),
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}


dag = DAG('test_s3_download',
          default_args=default_args,
          schedule_interval=None) 

TEST_BUCKET = 'bucket-dev'
TEST_KEY = 'BlueMetric/dms.json'


# simple download task
def download_file(bucket, key):
    import boto3
    s3 = boto3.resource('s3')
    print(s3.Object(bucket, key).get()['Body'].read())


download_from_s3 = PythonOperator(
    task_id='download_from_s3',
    python_callable=download_file,
    op_kwargs={'bucket': TEST_BUCKET, 'key': TEST_KEY},
    dag=dag)


sleep_task = BashOperator(
    task_id='sleep_for_1',
    bash_command='sleep 1',
    dag=dag)


download_from_s3.set_downstream(sleep_task)

就像其他时候使用docker一样,我们在docker容器中的〜/.aws 中创建 config 文件,该文件的内容为:

As we have done other times when using docker, we create within the docker container, in ~/.aws the config file that reads:

[default]
region = eu-west-1

并且只要容器在AWS边界之内,它就可以解决每个请求,而无需指定凭证.

and as long as the container is within the AWS boundaries, it'll resolve every request without any need to specify credentials.

这是我们正在使用的 Dockerfile :

FROM puckel/docker-airflow:1.10.7

USER root

COPY entrypoint.sh /entrypoint.sh
COPY requirements.txt /requirements.txt

RUN apt-get update

RUN ["chmod", "+x", "/entrypoint.sh"]

RUN mkdir -p /home/airflow/.aws \
&& touch /home/airflow/.aws/config \
&& echo '[default]' > /home/airflow/.aws/config \
&& echo 'region = eu-west-1' >> /home/airflow/.aws/config

RUN ["chown", "-R", "airflow", "/home/airflow"]

USER airflow

ENTRYPOINT ["/entrypoint.sh"]

# # Expose webUI and flower respectively
EXPOSE 8080
EXPOSE 5555

,所有内容都像魅力一样.目录和所有者的更改已成功完成,但是在运行DAG时失败,提示:

and everything works like a charm. Directory and change of owner are done successfully but when running the DAG, it fails saying:

...
...
File "/usr/local/airflow/.local/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign
    auth.add_auth(request)
  File "/usr/local/airflow/.local/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth
    raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
[2020-08-24 11:15:02,125] {{taskinstance.py:1117}} INFO - All retries failed; marking task as FAILED

因此,我们认为Airflow的工作节点确实使用了另一个用户.

So we are thinking that the worker node of Airflow does use another user.

你们中有人知道发生了什么吗?感谢您提供的任何建议/建议.

Does any of you know what's going on? Thank you for any advice/light you can provide.

推荐答案

为任务定义创建正确的 task_role_arn .此角色是容器内部触发的进程承担的角色.另一个注释是该错误不应读取:

Create a proper task_role_arn for the task definition. This role is the one assumed by the processes triggered inside the container. Another annotation is that the error should not read:

无法找到凭据

访问被拒绝:您无权访问s3:GetObject .

这篇关于在DAG中使用boto3时,Apache airflow无法找到AWS凭证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆