通过在Apache Beam中创建模板,无法按所需顺序运行多个管道 [英] Unable to run multiple Pipelines in desired order by creating template in Apache Beam

查看:96
本文介绍了通过在Apache Beam中创建模板,无法按所需顺序运行多个管道的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两条单独的管道,分别是"P1"和"P2".根据我的要求,我仅在P1完全完成执行后才需要运行P2.我需要通过单个模板完成整个操作.

I have two separate Pipelines say 'P1' and 'P2'. As per my requirement I need to run P2 only after P1 has completely finished its execution. I need to get this entire operation done through a single Template.

基本上,模板在找到run()时即以p1.run()的方式创建.

Basically Template gets created the moment it finds run() its way say p1.run().

所以我看到我需要使用两个不同的模板来处理两个不同的管道,但这不能满足我严格的基于订单的管道执行要求.

So what I can see that I need to handle two different Pipelines using two different templates but that would not satisfy my strict order based Pipeline execution requirement.

我可以想到的另一种方法是,在 p2.run()的ParDo中调用 p1.run()并保持p2的run()等到完成p1的run().我尝试过这种方法,但遇到了下面给出的IllegalArgumentException.

Another way I could think of calling p1.run() inside the ParDo of p2.run() and keep the run() of p2 wait until finish of run() of p1. I tried this way but stuck at IllegalArgumentException given below.

java.io.NotSerializableException:PipelineOptions对象不可序列化,并且不应嵌入到转换中(您是在字段中还是在匿名类中捕获了PipelineOptions对象?).相反,如果您使用的是DoFn,请在运行时通过ProcessContext/StartBundleContext/FinishBundleContext.getPipelineOptions()访问PipelineOptions,或在管道构建时从PipelineOptions中预提取必要的字段.

java.io.NotSerializableException: PipelineOptions objects are not serializable and should not be embedded into transforms (did you capture a PipelineOptions object in a field or in an anonymous class?). Instead, if you're using a DoFn, access PipelineOptions at runtime via ProcessContext/StartBundleContext/FinishBundleContext.getPipelineOptions(), or pre-extract necessary fields from PipelineOptions at pipeline construction time.

是否根本不可能在任何转换中调用另一个管道的'Pardo'内的管道的run()吗?

Is it not possible at all to call the run() of a pipeline inside any transform say 'Pardo' of another Pipeline?

如果是这种情况,那么如何满足通过创建单个模板依次调用两个不同管道的要求?

If this is the case then how to satisfy my requirement of calling two different Pipelines in sequence by creating a single template?

推荐答案

模板只能包含一个管道.为了顺序执行两个单独的管道(每个管道都是一个模板)的执行顺序,您需要在外部进行调度,例如通过某些工作流程管理系统(例如Anuj提到的内容或Airflow或其他内容),您可能会从例如本文.

A template can contain only a single pipeline. In order to sequence the execution of two separate pipelines each of which is a template, you'll need to schedule them externally, e.g. via some workflow management system (such as what Anuj mentioned, or Airflow, or something else - you might draw some inspiration from this post for example).

我们知道在单个管道中需要在Beam中更好地排序原语,但还没有具体的设计.

We are aware of the need for better sequencing primitives in Beam within a single pipeline, but do not have a concrete design yet.

这篇关于通过在Apache Beam中创建模板,无法按所需顺序运行多个管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆