如何在虚拟环境中使用Apache气流? [英] How to use apache airflow in a virtual environment?

查看:125
本文介绍了如何在虚拟环境中使用Apache气流?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对使用apache气流很陌生。我使用pycharm作为我的IDE。我创建一个项目(anaconda环境),创建一个包含DAG定义和Bash运算符的python脚本。当我打开气流网络服务器时,未显示我的DAGS。仅显示默认示例DAG。我的 AIRFLOW_HOME 变量包含〜/ airflow 。所以我在这里存储了我的python脚本,现在显示了。

I am quite new to using apache airflow. I use pycharm as my IDE. I create a project (anaconda environment), create a python script that includes DAG definitions and Bash operators. When I open my airflow webserver, my DAGS are not shown. Only the default example DAGs are shown. My AIRFLOW_HOME variable contains ~/airflow. So i stored my python script there and now it shows.

如何在项目环境中使用它?

How do I use this in a project environment?

是否在每个项目开始时更改环境变量?

Do I change the environment variable at the start of every project?

是否可以为每个项目添加特定的气流主目录?

Is there a way to add specific airflow home directories for each project?

我不想将DAG存储在默认气流目录中,因为我想将其添加到git存储库中。

I dont wanna be storing my DAGs in the default airflow directory since I would wanna add it to my git repository. Kindly help me out.

推荐答案

您可以设置/覆盖在 $ {AIRFLOW_HOME} /中指定的气流选项airflow.cfg 和环境变量一起使用以下格式:$ AIRFLOW __ {SECTION} __ {KEY}(请注意双下划线)。这是与气流文档的链接。因此,您只需

You can set/override airflow options specified in ${AIRFLOW_HOME}/airflow.cfg with environment variables by using this format: $AIRFLOW__{SECTION}__{KEY} (note the double underscores). Here is a link to airflow docs. So you can simply do

export AIRFLOW__CORE__DAGS_FOLDER=/path/to/dags/folder

但是,针对不同的项目执行此操作既繁琐又容易出错。或者,您可以考虑使用 pipenv 来管理虚拟环境,而不是anaconda。这是有关Pipenv及其解决的问题的尼斯指南 pipenv 的默认功能之一是当您在激活virtualenv的情况下生成shell时,它会自动加载 .env 文件中定义的变量。因此,使用 pipenv 的工作流程如下所示:

However, it is tedious and error prune to do this for different projects. As alternative, you can consider using pipenv for managing virtual environments instead of anaconda. Here is a nice guide about pipenv and problems it solves. One of the default features of pipenv is that it automatically loads variables defined in .env file when you spawn a shell with the virtualenv activated. So here is how your workflow with pipenv could look like:

cd /path/to/my_project

# Creates venv with python 3.7 
pipenv install --python=3.7 Flask==1.0.3 apache-airflow==1.10.3

# Set home for airflow in a root of your project (specified in .env file)
echo "AIRFLOW_HOME=${PWD}/airflow" >> .env

# Enters created venv and loads content of .env file 
pipenv shell

# Initialise airflow
airflow initdb
mkdir -p ${AIRFLOW_HOME}/dags/




注意:用法 Flask == 1.03 最后,我会解释,但这是因为pipenv检查子依赖性是否兼容,以确保可重复性。

Note: usage of Flask==1.03 I will explain at the end, but this is because pipenv checks whether sub-dependencies are compatible in order to insure reproducibility.

因此,在执行这些步骤之后,您将获得以下项目结构

So after these steps you would get the following project structure

my_project
├── airflow
│   ├── airflow.cfg
│   ├── airflow.db
│   ├── dags
│   ├── logs
│   │   └── scheduler
│   │       ├── 2019-07-07
│   │       └── latest -> /path/to/my_project/airflow/logs/scheduler/2019-07-07
│   └── unittests.cfg
├── .env
├── Pipfile
└── Pipfile.lock

现在,当您首次初始化气流时,它将创建 $ {AIRFLOW_HOME} /airflow.cfg 文件,并将使用/扩展 $ {AIRFLOW_HOME} / dags 作为<$ c $的值c> dags_folder 。如果您仍然需要 dags_folder 的其他位置,则可以再次使用 .env 文件

Now when you initialise airflow for the first time it will create ${AIRFLOW_HOME}/airflow.cfg file and will use/expand ${AIRFLOW_HOME}/dags as value for dags_folder. In case, you still need a different location for dags_folder you can use .env file again

echo "AIRFLOW__CORE__DAGS_FOLDER=/different/path/to/dags/folder" >> .env

因此,您 .env 文件会看起来像这样:

Thus, you .env file will look like:

AIRFLOW_HOME=/path/to/my_project/airflow
AIRFLOW__CORE__DAGS_FOLDER=/different/path/to/dags/folder



我们取得了什么成就以及为什么这样做会很好



What have we accomplished and why this would work just fine


  1. 由于您在虚拟环境中安装了气流,因此需要激活它才能使用气流

  2. 由于使用 pipenv 进行了此操作,因此您需要使用 pipenv shell 以激活venv

  3. 由于您使用 pipenv shell ,因此您总是获取导出到venv中的 .env 中定义的变量。 pipenv 仍将是一个子外壳,因此,当您退出它时,所有其他环境变量也将被清除。

  4. 使用气流的不同项目的日志文件等将具有不同的位置。

  1. Since you installed airflow in virtual environment, you would need to activate it in order to use airflow
  2. Since you did it with pipenv, you would need to use pipenv shell in order to activate venv
  3. Since you use pipenv shell, you would always get variables defined in .env exported into your venv. On top of that pipenv will still be a subshell, therefore, when you exit it, all additional environmental variables would be cleared as well.
  4. Different projects that use airflow would have different locations for their log files etc.



关于pipenv的其他说明



Additional notes on pipenv


  1. 要使用通过pipenv创建的venv作为IDE的项目解释器,请使用 pipenv --py 。

  2. 默认情况下, pipenv 会像conda一样在同一全局位置创建所有venv,但是您可以将其行为更改为创建<$通过将 export PIPENV_VENV_IN_PROJECT = 1 添加到您的 .bashrc .venv $ c>(或其他 rc )。然后,当您进入项目解释器的设置时,PyCharm将能够自动将其拾取。

  1. In order to use venv created with pipenv as your IDE's project interpreter, use path provided by pipenv --py.
  2. By default, pipenv creates all venvs in the same global location like conda does, but you can change that behaviour to creating .venv in a project's root by adding export PIPENV_VENV_IN_PROJECT=1 into your .bashrc (or other rc). Then PyCharm would be able to automatically pick it up when you go into settings of project interpreter.



使用说明烧瓶== 1.0.3



PyPi的气流1.10.3取决于烧瓶> = 1.0,< ; 2.0 jinja2> = 2.7.3,< = 2.10.0 。今天,当我测试代码片段时,最新的 flask 1.1.0 ,这取决于 jinja2> = 2.10.1 。这意味着,尽管pipenv可以安装所有必需的软件,但是它无法锁定依赖项。因此,为了干净使用我的代码示例,我必须指定 flask 的版本,该版本要求与气流兼容的 jinja2 版本要求。但是没有什么可担心的。 GitHub上最新版本的 airflow 已得到修复。

Note on usage of Flask==1.0.3

Airflow 1.10.3 from PyPi depends on flask>=1.0, <2.0 and on jinja2>=2.7.3, <=2.10.0. Today, when I tested code snippets the latest available flask was 1.1.0 which depends on jinja2>=2.10.1. This means that although pipenv can install all required software, but it fails to lock dependencies. So for clean use of my code samples, I had to specify version of flask that requires version of jinja2 compatible with airflow requirements. But there is nothing to worry about. The latest version of airflow on GitHub is already fixed that.

这篇关于如何在虚拟环境中使用Apache气流?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆