如何自定义GCP数据流模板? [英] How to Customize GCP Dataflow template?

查看:53
本文介绍了如何自定义GCP数据流模板?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我打算使用在文本文件上发布/替换为文本文件具有少量自定义功能的Cloud Storage 数据流模板,例如在写入Cloud Storage之前处理(按摩)PubSub消息.

I intend to use Pub/Sub to Text Files on Cloud Storage dataflow template with few customizations such as process(massage) the PubSub message before writing to Cloud Storage.

我写了apache-beam管道代码,但是对如何部署它感到困惑.它使用的参数将与发布/订阅到完全相同云端存储中的文本文件

I have apache-beam pipeline code written but confused on how to deploy it. The parameters it consumes will be exactly the same as Pub/Sub to Text Files on Cloud Storage

来自文档,我了解我可以使用以下一种Google提供的模板或创建您自己的模板.但是,除了创建我自己的模板之外,还有一种更好的方法可以自定义Google提供的模板,因为它可以满足我的大多数要求

From documentation I understand that I can use one of the Google-provided templates or create your own. But instead of creating my own template is there a better way to customize Google-provided template as it suffices most of my requirements

推荐答案

我认为我们处于全有或全无的局面.唯一不需要创建自己的模板的定制通过参数公开,它们可以

I think we are in an all or nothing situation. the only customization that does not need creating your own template is exposed through parameters and they do no accept Ptransforms.

由于需要修改提取的发布/订阅消息,因此需要创建自己的PTransform,将其集成到管道中并生成关联的模板.

Since you need to modify the ingested Pub/sub messages, you will need to create your own PTransform, integrate it in your pipeline and generate the associated template.

鉴于这只是一点点补充,您最好的选择是克隆模板源并将其复制到您自己的本地Beam项目中(或从克隆的项目中生成).!!请勿修改示例模板本身.只是可用的代码示例文档.你很好称呼它.

Given that it's only one little addition, your best call is to clone the template sources and copy them in your own local Beam project (or generate it from the cloned project). !!Do not modify the example template itself. Just the code example available here!! Generate the template as stated in the documentation. You are good to call it.

这篇关于如何自定义GCP数据流模板?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆