无法通过在 Apache Beam 中创建模板以所需顺序运行多个管道 [英] Unable to run multiple Pipelines in desired order by creating template in Apache Beam

查看:18
本文介绍了无法通过在 Apache Beam 中创建模板以所需顺序运行多个管道的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个单独的管道,分别是P1"和P2".根据我的要求,我只需要在 P1 完全完成执行后运行 P2.我需要通过一个模板完成整个操作.

I have two separate Pipelines say 'P1' and 'P2'. As per my requirement I need to run P2 only after P1 has completely finished its execution. I need to get this entire operation done through a single Template.

基本上模板在它找到 run() 的那一刻就被创建了,比如 p1.run().

Basically Template gets created the moment it finds run() its way say p1.run().

所以我可以看到我需要使用两个不同的模板处理两个不同的流水线,但这不能满足我严格的基于顺序的流水线执行要求.

So what I can see that I need to handle two different Pipelines using two different templates but that would not satisfy my strict order based Pipeline execution requirement.

我能想到的另一种方法是在 p2.run() 的 ParDo 中调用 p1.run() 并保持 p2 的 run() 等待直到完成p1 的运行().我尝试过这种方式,但卡在下面给出的 IllegalArgumentException 中.

Another way I could think of calling p1.run() inside the ParDo of p2.run() and keep the run() of p2 wait until finish of run() of p1. I tried this way but stuck at IllegalArgumentException given below.

java.io.NotSerializableException:PipelineOptions 对象不可序列化,不应嵌入到转换中(您是否在字段或匿名类中捕获了 PipelineOptions 对象?).相反,如果您使用的是 DoFn,请在运行时通过 ProcessContext/StartBundleContext/FinishBundleContext.getPipelineOptions() 访问 PipelineOptions,或者在管道构建时从 PipelineOptions 中预先提取必要的字段.

java.io.NotSerializableException: PipelineOptions objects are not serializable and should not be embedded into transforms (did you capture a PipelineOptions object in a field or in an anonymous class?). Instead, if you're using a DoFn, access PipelineOptions at runtime via ProcessContext/StartBundleContext/FinishBundleContext.getPipelineOptions(), or pre-extract necessary fields from PipelineOptions at pipeline construction time.

是否根本不可能在任何转换中调用管道的 run(),比如另一个管道的Pardo"?

Is it not possible at all to call the run() of a pipeline inside any transform say 'Pardo' of another Pipeline?

如果是这种情况,那么如何通过创建单个模板来满足我按顺序调用两个不同的 Pipelines 的要求?

If this is the case then how to satisfy my requirement of calling two different Pipelines in sequence by creating a single template?

推荐答案

一个模板只能包含一个管道.为了对两个单独的管道的执行进行排序,每个管道都是一个模板,您需要在外部安排它们,例如通过一些工作流管理系统(例如 Anuj 提到的,或 Airflow 或其他东西 - 您可能会从 这篇文章).

A template can contain only a single pipeline. In order to sequence the execution of two separate pipelines each of which is a template, you'll need to schedule them externally, e.g. via some workflow management system (such as what Anuj mentioned, or Airflow, or something else - you might draw some inspiration from this post for example).

我们意识到需要在单个流水线中对 Beam 中的原语进行更好的排序,但还没有具体的设计.

We are aware of the need for better sequencing primitives in Beam within a single pipeline, but do not have a concrete design yet.

这篇关于无法通过在 Apache Beam 中创建模板以所需顺序运行多个管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆