在Dataflow中使用自定义Docker容器 [英] Using custom docker containers in Dataflow

查看:113
本文介绍了在Dataflow中使用自定义Docker容器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通过此链接,我发现Google Cloud Dataflow为其工作人员使用了Docker容器:图像适用于Google Cloud Dataflow实例

From this link I found that Google Cloud Dataflow uses Docker containers for its workers: Image for Google Cloud Dataflow instances

我发现可以找到Docker容器的映像名称.

I see it's possible to find out the image name of the docker container.

但是,有没有一种方法可以获取此docker容器(即我要从哪个存储库中获取它),对其进行修改,然后指示我的Dataflow作业使用此新的docker容器?

But, is there a way I can get this docker container (ie from which repository do I go to get it?), modify it, and then indicate my Dataflow job to use this new docker container?

我问的原因是我们需要在docker上安装各种C ++和Fortran以及其他库代码,以便Dataflow作业可以调用它们,但是这些安装非常耗时,因此我们不想使用"df中的资源"属性选项.

The reason I ask is that we need to install various C++ and Fortran and other library code on our dockers so that the Dataflow jobs can call them, but these installations are very time consuming so we don't want to use the "resource" property option in df.

推荐答案

2020年5月更新

仅在Beam可移植性框架内支持自定义容器.

Custom containers are only supported within the Beam portability framework.

在可移植性框架内启动的管道当前必须显式(用户提供的标志)或隐式(例如,所有Python流管道都通过)传递--experiments=beam_fn_api.

Pipelines launched within portability framework currently must pass --experiments=beam_fn_api explicitly (user-provided flag) or implicitly (for example, all Python streaming pipelines pass that).

一旦DataflowRunner完全支持自定义容器,将有更多有关Dataflow的文档.有关其他Beam Runner中自定义容器的支持,请参见: http://beam.apache.org/文档/运行时/环境.

There will be more Dataflow-specific documentation once custom containers are fully supported by Dataflow runner. For support of custom containers in other Beam runners, see: http://beam.apache.org/documentation/runtime/environments.

用于Dataflow工作人员的Docker容器当前是私有的,无法修改或自定义.

The docker containers used for the Dataflow workers are currently private, and can't be modified or customized.

实际上,它们是从私有docker存储库提供的,因此我认为您无法将它们安装在计算机上.

In fact, they are served from a private docker repository, so I don't think you're able to install them on your machine.

这篇关于在Dataflow中使用自定义Docker容器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆