在DAG中使用boto3时,Apache airflow无法找到AWS凭证 [英] Apache airflow cannot locate AWS credentials when using boto3 inside a DAG
问题描述
我们正在使用ECS Fargate迁移到Apache Airflow.
We are migrating to Apache Airflow using ECS Fargate.
我们面临的问题很简单.我们有一个简单的DAG,其任务之一是与AWS中的某些外部服务进行通信(例如,从S3下载文件).这是DAG的脚本:
The problem we are facing, it's simple. We have a simple DAG that one of its tasks is to communicate with some external service in AWS (let's say, download a file from S3). This is the script of the DAG:
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from datetime import datetime, timedelta
# default arguments for each task
default_args = {
'owner': 'thomas',
'depends_on_past': False,
'start_date': datetime(2015, 6, 1),
'retries': 1,
'retry_delay': timedelta(minutes=1),
}
dag = DAG('test_s3_download',
default_args=default_args,
schedule_interval=None)
TEST_BUCKET = 'bucket-dev'
TEST_KEY = 'BlueMetric/dms.json'
# simple download task
def download_file(bucket, key):
import boto3
s3 = boto3.resource('s3')
print(s3.Object(bucket, key).get()['Body'].read())
download_from_s3 = PythonOperator(
task_id='download_from_s3',
python_callable=download_file,
op_kwargs={'bucket': TEST_BUCKET, 'key': TEST_KEY},
dag=dag)
sleep_task = BashOperator(
task_id='sleep_for_1',
bash_command='sleep 1',
dag=dag)
download_from_s3.set_downstream(sleep_task)
就像其他时候使用docker一样,我们在docker容器中的〜/.aws
中创建 config
文件,该文件的内容为:
As we have done other times when using docker, we create within the docker container, in ~/.aws
the config
file that reads:
[default]
region = eu-west-1
并且只要容器在AWS边界之内,它就可以解决每个请求,而无需指定凭证.
and as long as the container is within the AWS boundaries, it'll resolve every request without any need to specify credentials.
这是我们正在使用的 Dockerfile
:
FROM puckel/docker-airflow:1.10.7
USER root
COPY entrypoint.sh /entrypoint.sh
COPY requirements.txt /requirements.txt
RUN apt-get update
RUN ["chmod", "+x", "/entrypoint.sh"]
RUN mkdir -p /home/airflow/.aws \
&& touch /home/airflow/.aws/config \
&& echo '[default]' > /home/airflow/.aws/config \
&& echo 'region = eu-west-1' >> /home/airflow/.aws/config
RUN ["chown", "-R", "airflow", "/home/airflow"]
USER airflow
ENTRYPOINT ["/entrypoint.sh"]
# # Expose webUI and flower respectively
EXPOSE 8080
EXPOSE 5555
,所有内容都像魅力一样.目录和所有者的更改已成功完成,但是在运行DAG时失败,提示:
and everything works like a charm. Directory and change of owner are done successfully but when running the DAG, it fails saying:
...
...
File "/usr/local/airflow/.local/lib/python3.7/site-packages/botocore/signers.py", line 160, in sign
auth.add_auth(request)
File "/usr/local/airflow/.local/lib/python3.7/site-packages/botocore/auth.py", line 357, in add_auth
raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
[2020-08-24 11:15:02,125] {{taskinstance.py:1117}} INFO - All retries failed; marking task as FAILED
因此,我们认为Airflow的工作节点确实使用了另一个用户.
So we are thinking that the worker node of Airflow does use another user.
你们中有人知道发生了什么吗?感谢您提供的任何建议/建议.
Does any of you know what's going on? Thank you for any advice/light you can provide.
推荐答案
为任务定义创建正确的 task_role_arn
.此角色是容器内部触发的进程承担的角色.另一个注释是该错误不应读取:
Create a proper task_role_arn
for the task definition. This role is the one assumed by the processes triggered inside the container. Another annotation is that the error should not read:
无法找到凭据
但
访问被拒绝:您无权访问s3:GetObject
.
这篇关于在DAG中使用boto3时,Apache airflow无法找到AWS凭证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!