气流无法识别我的S3连接设置 [英] Airflow doesn't recognise my S3 Connection setting

查看:185
本文介绍了气流无法识别我的S3连接设置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用带有Kubernetes执行器的Airflow并在本地进行测试(使用minikube),虽然能够启动并运行它,但似乎无法将日志存储在S3中.我已经尝试了所有描述的解决方案,但仍然出现以下错误,

I am using Airflow with Kubernetes executor and testing out locally (using minikube), While I was able to get it up and running, I cant seem to store my logs in S3. I have tried all solutions that are described and I am still getting the following error,

*** Log file does not exist: /usr/local/airflow/logs/example_python_operator/print_the_context/2020-03-30T16:02:41.521194+00:00/1.log
*** Fetching from: http://examplepythonoperatorprintthecontext-5b01d602e9d2482193d933e7d2:8793/log/example_python_operator/print_the_context/2020-03-30T16:02:41.521194+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='examplepythonoperatorprintthecontext-5b01d602e9d2482193d933e7d2', port=8793): Max retries exceeded with url: /log/example_python_operator/print_the_context/2020-03-30T16:02:41.521194+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd00688a650>: Failed to establish a new connection: [Errno -2] Name or service not known'))

我实现了此 answer 中提到的自定义日志记录类,没有运气.

I implemented a custom Logging class as mentioned in this answer and still no luck.

  • I use Puckel airflow 1.10.9
  • Stable Helm chart for airflow from charts/stable/airflow/

我的airflow.yaml看起来像这样

airflow:
  image:
     repository: airflow-docker-local
     tag: 1

  executor: Kubernetes

  service:
    type: LoadBalancer

  config:
    AIRFLOW__CORE__EXECUTOR: KubernetesExecutor
    AIRFLOW__CORE__TASK_LOG_READER: s3.task
    AIRFLOW__CORE__LOAD_EXAMPLES: True
    AIRFLOW__CORE__FERNET_KEY: ${MASKED_FERNET_KEY}
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://postgres:airflow@airflow-postgresql:5432/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://postgres:airflow@airflow-postgresql:5432/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:airflow@airflow-redis-master:6379/0

    # S3 Logging
    AIRFLOW__CORE__REMOTE_LOGGING: True
    AIRFLOW__CORE__REMOTE_LOG_CONN_ID: s3://${AWS_ACCESS_KEY_ID}:${AWS_ACCESS_SECRET_KEY}@S3
    AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: s3://${BUCKET_NAME}/logs
    AIRFLOW__CORE__S3_LOG_FOLDER: s3://${BUCKET_NAME}/logs
    AIRFLOW__CORE__LOGGING_LEVEL: INFO
    AIRFLOW__CORE__LOGGING_CONFIG_CLASS: log_config.LOGGING_CONFIG
    AIRFLOW__CORE__ENCRYPT_S3_LOGS: False
    # End of S3 Logging

    AIRFLOW__WEBSERVER__EXPOSE_CONFIG: True
    AIRFLOW__WEBSERVER__LOG_FETCH_TIMEOUT_SEC: 30
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-docker-local
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: 1
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_IMAGE_PULL_POLICY: Never
    AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: airflow
    AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM: airflow
    AIRFLOW__KUBERNETES__NAMESPACE: airflow
    AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: True
    AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS: '{\"_request_timeout\":[60,60]}'

persistence:
  enabled: true
  existingClaim: ''
  accessMode: 'ReadWriteMany'
  size: 5Gi

logsPersistence:
  enabled: false

workers:
  enabled: true

postgresql:
  enabled: true

redis:
  enabled: true

我尝试通过UI设置连接并通过airflow.yaml创建连接,但似乎没有任何效果,我已经尝试了3天了,但是没有运气,任何帮助将不胜感激.

I have tried setting up the Connection via UI and creating connection via airflow.yaml and nothing seems to work, I have been trying this for 3 days now with no luck, any help would be much appreciated.

我已附上屏幕截图以供参考,

I have attached the screenshot for reference,

推荐答案

我可以肯定的是,这个问题是因为尚未在辅助容器上设置s3日志记录配置.不能使用诸如AIRFLOW__CORE__REMOTE_LOGGING: True之类的环境变量来为工作组提供给定的配置集.如果希望在工作人员窗格中设置此变量,则必须复制该变量并将AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__附加到复制的环境变量名称:AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOGGING: True.

I am pretty certain this issue is because the s3 logging configuration has not been set on the worker pods. The worker pods don't get given configuration set using environment variables such as AIRFLOW__CORE__REMOTE_LOGGING: True. If you wish to set this variable in the worker pod then you must copy the variable and append AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__ to the copied environment variable name: AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__AIRFLOW__CORE__REMOTE_LOGGING: True.

在这种情况下,您将需要复制所有为s3日志指定config的变量,并将AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__附加到副本中.

In this case you would need to duplicate all of your variables specifying config for s3 logging and append AIRFLOW__KUBERNETES_ENVIRONMENT_VARIABLES__ to the copies.

这篇关于气流无法识别我的S3连接设置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆