为气流中的日志设置 s3 [英] setting up s3 for logs in airflow

查看:33
本文介绍了为气流中的日志设置 s3的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 docker-compose 来设置可扩展的气流集群.我的方法基于这个 Dockerfile https://hub.docker.com/r/puckel/docker-airflow/

I am using docker-compose to set up a scalable airflow cluster. I based my approach off of this Dockerfile https://hub.docker.com/r/puckel/docker-airflow/

我的问题是将日志设置为从 s3 写入/读取.当 dag 完成时,我收到这样的错误

My problem is getting the logs set up to write/read from s3. When a dag has completed I get an error like this

*** Log file isn't local.
*** Fetching here: http://ea43d4d49f35:8793/log/xxxxxxx/2017-06-26T11:00:00
*** Failed to fetch log file from worker.

*** Reading remote logs...
Could not read logs from s3://buckets/xxxxxxx/airflow/logs/xxxxxxx/2017-06-
26T11:00:00

我像这样在 airflow.cfg 文件中设置了一个新部分

I set up a new section in the airflow.cfg file like this

[MyS3Conn]
aws_access_key_id = xxxxxxx
aws_secret_access_key = xxxxxxx
aws_default_region = xxxxxxx

然后在airflow.cfg

remote_base_log_folder = s3://buckets/xxxx/airflow/logs
remote_log_conn_id = MyS3Conn

我是否正确设置了这个并且有错误?这里有我缺少的成功秘诀吗?

Did I set this up properly and there is a bug? Is there a recipe for success here that I am missing?

-- 更新

我尝试以 URI 和 JSON 格式导出,但似乎都不起作用.然后我导出了 aws_access_key_id 和 aws_secret_access_key,然后气流开始接收它.现在我在工作日志中发现了他的错误

I tried exporting in URI and JSON formats and neither seemed to work. I then exported the aws_access_key_id and aws_secret_access_key and then airflow started picking it up. Now I get his error in the worker logs

6/30/2017 6:05:59 PMINFO:root:Using connection to: s3
6/30/2017 6:06:00 PMERROR:root:Could not read logs from s3://buckets/xxxxxx/airflow/logs/xxxxx/2017-06-30T23:45:00
6/30/2017 6:06:00 PMERROR:root:Could not write logs to s3://buckets/xxxxxx/airflow/logs/xxxxx/2017-06-30T23:45:00
6/30/2017 6:06:00 PMLogging into: /usr/local/airflow/logs/xxxxx/2017-06-30T23:45:00

-- 更新

我也找到了这个链接https://www.mail-archive.com/dev@airflow.incubator.apache.org/msg00462.html

然后我进入我的一台工作机器(与网络服务器和调度程序分开)并在 python 中运行这段代码

I then shelled into one of my worker machines (separate from the webserver and scheduler) and ran this bit of code in python

import airflow
s3 = airflow.hooks.S3Hook('s3_conn')
s3.load_string('test', airflow.conf.get('core', 'remote_base_log_folder'))

我收到此错误.

boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden

我尝试导出几种不同类型的 AIRFLOW_CONN_ 环境,如连接部分 https://airflow.incubator.apache.org/concepts.html 以及对此问题的其他答案.

I tried exporting several different types of AIRFLOW_CONN_ envs as explained here in the connections section https://airflow.incubator.apache.org/concepts.html and by other answers to this question.

s3://<AWS_ACCESS_KEY_ID>:<AWS_SECRET_ACCESS_KEY>@S3

{"aws_account_id":"<xxxxx>","role_arn":"arn:aws:iam::<xxxx>:role/<xxxxx>"}

{"aws_access_key_id":"<xxxxx>","aws_secret_access_key":"<xxxxx>"}

我还导出了 AWS_ACCESS_KEY_ID 和 AWS_SECRET_ACCESS_KEY,但没有成功.

I have also exported AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY with no success.

这些凭据存储在数据库中,因此一旦我将它们添加到 UI 中,工作人员应该会选择它们,但由于某种原因,他们无法写入/读取日志.

These credentials are being stored in a database so once I add them in the UI they should be picked up by the workers but they are not able to write/read logs for some reason.

推荐答案

您需要通过 Airflow UI 设置 S3 连接.为此,您需要转到 Admin ->气流 UI 上的连接选项卡并为您的 S3 连接创建一个新行.

You need to set up the S3 connection through Airflow UI. For this, you need to go to the Admin -> Connections tab on airflow UI and create a new row for your S3 connection.

一个示例配置是:

Conn Id: my_conn_S3

Conn Type: S3

Extra: {"aws_access_key_id":"your_aws_key_id", "aws_secret_access_key": "your_aws_secret_key"}

这篇关于为气流中的日志设置 s3的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆