在dag之间使用单独的environ和sys.path [英] Use separate environ and sys.path between dags

查看:66
本文介绍了在dag之间使用单独的environ和sys.path的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

* TLDR :此问题最初基于后来被确定为该问题标题更新的问题。跳至更新2以获取最相关的问题详细信息。

* TLDR: This question originally based on problem that was later determined to be due to the updated title of this question. Skip to "Update 2" for most relevant question details.

具有dag文件,该文件从另一个位置的另一个python文件导入dict的python列表,并基于以下内容创建dag列表的dict值和气流有一个奇怪的问题,它看起来与我手动运行dag文件时有所不同。诸如...

Have dag file that imports a python list of dicts from another python file in another location and creates a dag based on the list's dict values and airflow is having weird problem where it appear to see something different that when I run the dag file manually. Some snippet like...

...
environ["PROJECT_HOME"] = "/path/to/some/project/files"
# import certain project files
sys.path.append(environ["PROJECT_HOME"])
import tables as tt

tables = tt.tables

for table in tables:
    print table
    assert isinstance(table, dict)
    <create some dag task 1>
    <create some dag task 2>
    ...

手动运行py文件时〜/ airflow / dag / 目录,没有引发任何错误,并且for循环显示命令,但是当在 airflow list_dags中运行时,气流显然会在Web服务器中看到不同的内容。运行气流list_dags 我收到错误消息

When running the py file manually from the ~/airflow/dag/ dir, there are no errors thrown and the for loop prints the dicts, but airflow apparently sees things differently in the webserver and when running airflow list_dags. Running airflow list_dags I get the error

    assert isinstance(table, dict)
AssertionError

,不知道如何测试造成这种情况的原因是,因为再次从dag位置手动运行py文件时,没有问题,并且print语句显示了dicts,并且Web服务器用户界面没有显示更多错误消息。

and don't know how to test what is causing this, since again when running the py file manually from the dag location, there is no problem and the print statement shows dicts and the webserver UI shows no further error message.

有人知道这里会发生什么吗?

Anyone know what could be going on here? Maybe something about how the imports are working?

* 更新1

在从导入的python模块调用函数时,看到了更多的怪异之处,当手动运行dag文件时,一切运行正常,但是 airflow list_dags 说...

Seeing more weirdness in that when calling functions from the imported python module, everything runs fine when running the dag file manually, but airflow list_dags says...


AttributeError:'module'对象没有属性'my_func'

AttributeError: 'module' object has no attribute 'my_func'

让我更加怀疑某些导入异常,即使这与我在另一个dag文件中使用的过程完全相同(即,设置一些 environ 值并附加到 sys.path )导入该dag的模块,并且在那里没有问题。

making me even further suspect some import weirdness, even though this is the exact same process I am using in another dag file (ie. setting some environ values and appending to sys.path) to import modules for that dag and have no problems there.

* 更新2

问题似乎是(在打印各种 sys.path 之后环境 module .__ all __ 信息(错误断言)正在导入的名称相似的模块来自另一个项目,我为此执行了完全相同的步骤。就是还有另一个文件...

The problem appears to be (after printing various sys.path, environ, and module.__all__ info at the erroring assert) that a similarly-named module that is getting imported is from the another project I did this same exact procedure for. Ie. have another file that does...

...
environ["PROJECT_HOME"] = "/path/to/some/project/files"
# import certain project files
sys.path.append(environ["PROJECT_HOME"])
import tables as tt

tables = tt.tables

for table in tables:
    print table
    assert isinstance(table, dict)
    <create some dag task 1>
    <create some dag task 2>
    ...

,此项目的首页已被用来下载类似名称的模块该对象也有一个名为我所期望的obj(即使当我在 sys.path 的前面插入项目文件夹时)。除了制作打包的广告之外,还有一种方法可以保持气流从合并所有不同的dag的所有 environ sys.path 值(因为我在各种bash中使用$ PROJECT_HOME python任务脚本)?

and this project home is getting used instead to download a similarly named module that also has a obj named what I was expecting (even when I insert the projects folder at front of sys.path). Other than making packaged dags is there a way to keep airflow from combining all of the environ and sys.path values of different dags (since I use $PROJECT_HOME in various bash and python task scripts)?

推荐答案

为从其他路径引入特定模块,我使用了解决方案在此处通过指定其他python模块的绝对文件路径来导入它们。

For bringing in specific modules from other paths, I use the solution here to import other python modules by specifying their absolute file path.

用于运行各种python脚本作为使用不同python解释器的气流任务,我会做类似的事情...

For running various python scripts as airflow tasks using different python interpreters, I do something like...

do_stuff_a = BashOperator(
        task_id='my_task_a',
        bash_command='/path/to/virtualenv_a/bin/python /path/to/script_a.py'),
        execution_timeout=timedelta(minutes=30),
        dag=dag)

为在类似的问题中完成了在这里

as done in similar question here

这篇关于在dag之间使用单独的environ和sys.path的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆