气流无法识别我的S3连接设置 [英] Airflow doesn't recognise my S3 Connection setting
问题描述
我正在使用带有Kubernetes执行器的Airflow并在本地进行测试(使用minikube),虽然能够启动并运行它,但似乎无法将日志存储在S3中.我已经尝试了所有描述的解决方案,但仍然出现以下错误,
I am using Airflow with Kubernetes executor and testing out locally (using minikube), While I was able to get it up and running, I cant seem to store my logs in S3. I have tried all solutions that are described and I am still getting the following error,
*** Log file does not exist: /usr/local/airflow/logs/example_python_operator/print_the_context/2020-03-30T16:02:41.521194+00:00/1.log
*** Fetching from: http://examplepythonoperatorprintthecontext-5b01d602e9d2482193d933e7d2:8793/log/example_python_operator/print_the_context/2020-03-30T16:02:41.521194+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='examplepythonoperatorprintthecontext-5b01d602e9d2482193d933e7d2', port=8793): Max retries exceeded with url: /log/example_python_operator/print_the_context/2020-03-30T16:02:41.521194+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd00688a650>: Failed to establish a new connection: [Errno -2] Name or service not known'))
我实现了此 answer 中提到的自定义日志记录类,没有运气.
I implemented a custom Logging class as mentioned in this answer and still no luck.
- 我使用Puckel airflow 1.10.9
- 来自图表/stable/airflow/的气流的稳定头盔图.
- I use Puckel airflow 1.10.9
- Stable Helm chart for airflow from charts/stable/airflow/
我的airflow.yaml
看起来像这样
airflow:
image:
repository: airflow-docker-local
tag: 1
executor: Kubernetes
service:
type: LoadBalancer
config:
AIRFLOW__CORE__EXECUTOR: KubernetesExecutor
AIRFLOW__CORE__TASK_LOG_READER: s3.task
AIRFLOW__CORE__LOAD_EXAMPLES: True
AIRFLOW__CORE__FERNET_KEY: ${MASKED_FERNET_KEY}
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://postgres:airflow@airflow-postgresql:5432/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://postgres:airflow@airflow-postgresql:5432/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:airflow@airflow-redis-master:6379/0
# S3 Logging
AIRFLOW__CORE__REMOTE_LOGGING: True
AIRFLOW__CORE__REMOTE_LOG_CONN_ID: s3://${AWS_ACCESS_KEY_ID}:${AWS_ACCESS_SECRET_KEY}@S3
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: s3://${BUCKET_NAME}/logs
AIRFLOW__CORE__S3_LOG_FOLDER: s3://${BUCKET_NAME}/logs
AIRFLOW__CORE__LOGGING_LEVEL: INFO
AIRFLOW__CORE__LOGGING_CONFIG_CLASS: log_config.LOGGING_CONFIG
AIRFLOW__CORE__ENCRYPT_S3_LOGS: False
# End of S3 Logging
AIRFLOW__WEBSERVER__EXPOSE_CONFIG: True
AIRFLOW__WEBSERVER__LOG_FETCH_TIMEOUT_SEC: 30
AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-docker-local
AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: 1
AIRFLOW__KUBERNETES__WORKER_CONTAINER_IMAGE_PULL_POLICY: Never
AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: airflow
AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM: airflow
AIRFLOW__KUBERNETES__NAMESPACE: airflow
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: True
AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS: '{\"_request_timeout\":[60,60]}'
persistence:
enabled: true
existingClaim: ''
accessMode: 'ReadWriteMany'
size: 5Gi
logsPersistence:
enabled: false
workers:
enabled: true
postgresql:
enabled: true
redis:
enabled: true
我尝试通过UI设置连接并通过airflow.yaml
创建连接,但似乎没有任何效果,我已经尝试了3天了,但是没有运气,任何帮助将不胜感激.
I have tried setting up the Connection via UI and creating connection via airflow.yaml
and nothing seems to work, I have been trying this for 3 days now with no luck, any help would be much appreciated.
我已附上屏幕截图以供参考,
I have attached the screenshot for reference,
推荐答案
我可以肯定的是,这个问题是因为尚未在辅助容器上设置s3日志记录配置.不能使用诸如AIRFLOW__CORE__REMOTE_LOGGING: True
之类的环境变量来为工作组提供给定的配置集.如果希望在工作人员窗格中设置此变量,则必须复制该变量并将AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__
附加到复制的环境变量名称:AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOGGING: True
.
I am pretty certain this issue is because the s3 logging configuration has not been set on the worker pods. The worker pods don't get given configuration set using environment variables such as AIRFLOW__CORE__REMOTE_LOGGING: True
. If you wish to set this variable in the worker pod then you must copy the variable and append AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__
to the copied environment variable name: AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOGGING: True
.
在这种情况下,您将需要复制所有为s3日志指定config的变量,并将AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__
附加到副本中.
In this case you would need to duplicate all of your variables specifying config for s3 logging and append AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__
to the copies.
这篇关于气流无法识别我的S3连接设置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!