通过Docker Operator在Airflow中运行DBT [英] Running DBT within Airflow through the Docker Operator

查看:150
本文介绍了通过Docker Operator在Airflow中运行DBT的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

中运行我的问题气流而不复制我们的仓库,我目前正在运行气流并通过git同步dag.我正在考虑将DBT包括在我的工作流程中的其他选择. louis_guitton 的一项建议是对DBT项目进行Docker化,并通过

Building my question on How to run DBT in airflow without copying our repo, I am currently running airflow and syncing the dags via git. I am considering different option to include DBT within my workflow. One suggestion by louis_guitton is to Dockerize the DBT project, and run it in Airflow via the Docker Operator.

我以前没有在Airflow或一般DBT中使用Docker Operator的经验.我想知道是否有人尝试过或者可以提供一些有关整合该工作流程的经验的见解,我的主要问题是:

I have no prior experience using the Docker Operator in Airflow or generally DBT. I am wondering if anyone has tried or can provide some insights about their experience incorporating that workflow, my main questions are:

  1. DBT整个项目应该作为一个Docker容器运行还是分解?(例如:测试是否与dbt任务作为单独的容器运行?)
  2. 通过Docker操作员运行时,DBT的日志和UI是否可以访问和/或仍然有用?
  3. 如何运行部分管道?(例如:只希望运行一部分管道)

推荐答案

从您的问题来看,尝试独立于气流来自行对dbt进行dockerise将会使您受益.您的许多问题都会消失.但是无论如何,这是我的答案.

Judging by your questions, you would benefit from trying to dockerise dbt on its own, independently from airflow. A lot of your questions would disappear. But here are my answers anyway.

DBT整个项目应该作为一个Docker容器运行,还是分解?(例如:测试是否与dbt任务作为单独的容器运行?)

Should DBT as a whole project be run as one Docker container, or is it broken down? (for example: are tests ran as a separate container from dbt tasks?)

我建议您为整个项目构建一个docker映像.由于dbt是python CLI工具,因此docker映像可以基于python映像.然后,您可以使用docker映像的CMD参数来运行您将在docker外部运行的任何dbt命令.请记住 docker run 的语法(与dbt无关):您可以指定要在调用时运行的命令

I suggest you build one docker image for the entire project. The docker image can be based on the python image since dbt is a python CLI tool. You then use the CMD arguments of the docker image to run any dbt command you would run outside docker. Please remember the syntax of docker run (which has nothing to do with dbt): you can specify any COMMAND you wand to run at invocation time

$ docker run [OPTIONS] IMAGE[:TAG|@DIGEST] [COMMAND] [ARG...]

此外,"docker dbt"在Google上的首次热播是此dockerfile 可以帮助您入门

Also, the first hit on Google for "docker dbt" is this dockerfile that can get you started

通过Docker操作员运行时,DBT的日志和UI是否可以访问和/或仍然有用?

Are logs and the UI from DBT accessible and/or still useful when run via the Docker Operator?

同样,这不是dbt问题,而是docker问题或气流问题.

Again, it's not a dbt question but rather a docker question or an airflow question.

使用DockerOperator时可以在气流UI中看到日志吗?是的,请参阅此如何使用屏幕快照撰写博客文章.

Can you see the logs in the airflow UI when using a DockerOperator? Yes, see this how to blog post with screenshots.

您可以从Docker容器访问日志吗?是的,Docker容器将日志发送到 stdout stderr 输出流(您可以在气流中看到它们,因为气流可以吸收这些流).但是日志也存储在主机上的JSON文件中的/var/lib/docker/containers/文件夹中.如果您有任何高级需求,则可以使用工具(或简单的BashOperator或PythonOperator)获取这些日志,并根据需要进行操作.

Can you access logs from a docker container? Yes, Docker containers emit logs to stdout and stderr output streams (which you can see in airflow, since airflow picks this up). But logs are also stored in JSON files on the host machine in a folder /var/lib/docker/containers/. If you have any advanced needs, you can pick up those logs with a tool (or a simple BashOperator or PythonOperator) and do what you need with it.

如何运行部分管道?(例如:只希望运行一部分管道)

How would partial pipelines be run? (example: wanting to run only a part of the pipeline)

请参阅答案1,您将使用以下命令运行docker dbt映像

See answer 1, you would run your docker dbt image with the command

$ docker run my-dbt-image dbt run -m stg_customers

这篇关于通过Docker Operator在Airflow中运行DBT的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆