在Airflow中设置S3日志记录 [英] Setting up S3 logging in Airflow
问题描述
这让我发疯。
我正在云环境中设置气流。我有一台服务器运行调度程序和网络服务器,一台服务器作为芹菜工作者,并且正在使用气流1.8.0。
I'm setting up airflow in a cloud environment. I have one server running the scheduler and the webserver and one server as a celery worker, and I'm using airflow 1.8.0.
运行作业可以正常工作。 拒绝起作用的是日志记录。
Running jobs works fine. What refuses to work is logging.
我已经在两台服务器上的airflow.cfg中设置了正确的路径:
I've set up the correct path in airflow.cfg on both servers:
remote_base_log_folder = s3:// my-bucket / airflow_logs /
remote_base_log_folder = s3://my-bucket/airflow_logs/
remote_log_conn_id = s3_logging_conn
remote_log_conn_id = s3_logging_conn
我已经在气流UI中设置了s3_logging_conn,并使用此处。
I've set up s3_logging_conn in the airflow UI, with the access key and the secret key as described here.
我使用
s3 = airflow.hooks.S3Hook('s3_logging_conn')
s3 = airflow.hooks.S3Hook('s3_logging_conn')
s3。 load_string('test','test',bucket_name ='my-bucket')
s3.load_string('test','test',bucket_name='my-bucket')
此可行在两个服务器上。因此,连接已正确设置。
This works on both servers. So the connection is properly set up. Yet all I get whenever I run a task is
***日志文件不是本地文件。
*** Log file isn't local.
***在此处获取:http:// *******
*** Fetching here: http://*******
***无法从worker中获取日志文件。
*** Failed to fetch log file from worker.
***正在读取远程日志...
*** Reading remote logs...
无法从s3:// my-中读取日志bucket / airflow_logs / my-dag / my-task / 2018-02-15T21:46:47.577537
Could not read logs from s3://my-bucket/airflow_logs/my-dag/my-task/2018-02-15T21:46:47.577537
我尝试手动上传日志遵循预期的约定,Web服务器仍然无法接收它-因此问题出在两端。我不知所措,到目前为止,我所读的一切都告诉我这应该有效。我即将安装1.9.0,可以听到更改日志记录,看看自己是否更幸运。
I tried manually uploading the log following the expected conventions and the webserver still can't pick it up - so the problem is on both ends. I'm at a loss at what to do, everything I've read so far tells me this should be working. I'm close to just installing the 1.9.0 which I hear changes logging and see if I'm more lucky.
更新:我完成了Airflow 1.9的全新安装并按照此处的具体说明进行操作。
UPDATE: I made a clean install of Airflow 1.9 and followed the specific instructions here.
Web服务器不会甚至现在从以下错误开始:
Webserver won't even start now with the following error:
airflow.exceptions.AirflowConfigException:在配置中找不到部分/密钥[core / remote_logging]
airflow.exceptions.AirflowConfigException: section/key [core/remote_logging] not found in config
是在此配置模板。
所以我尝试删除它并仅加载S3处理程序而不先检查,而是得到以下错误消息:
So I tried removing it and just loading the S3 handler without checking first and I got the following error message instead:
无法加载配置,包含配置错误。
Unable to load the config, contains a configuration error.
追溯(最近一次通话最近):
Traceback (most recent call last):
文件 /usr/lib64/python3.6/logging/config.py,行384,处于解析状态:
File "/usr/lib64/python3.6/logging/config.py", line 384, in resolve:
self.importer(二手)
self.importer(used)
ModuleNotFoundError:未命名模块
ModuleNotFoundError: No module named
'airflow.utils.log.logging_mixin.RedirectStdHandler';
'airflow.utils.log.logging_mixin.RedirectStdHandler';
'airflow.utils.log.logging_mixin'不是软件包
'airflow.utils.log.logging_mixin' is not a package
我感觉这不应该那么困难。
I get the feeling that this shouldn't be this hard.
任何帮助,不胜感激,
推荐答案
已解决:
- 已升级到1.9
- 运行了此评论中中描述的步骤
-
已添加
- upgraded to 1.9
- ran the steps described in this comment
added
[core]
[core]
remote_logging = True
remote_logging = True
至airflow.cfg
to airflow.cfg
点安装--upgrade airflow [log]
pip install --upgrade airflow[log]
现在一切正常。
这篇关于在Airflow中设置S3日志记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!