Apache Beam-跳过管道步骤 [英] Apache Beam - skip pipeline step
问题描述
我正在使用Apache Beam来建立包含2个主要步骤的管道:
I'm using Apache Beam to set up a pipeline consisting of 2 main steps:
- 使用波束变换来变换数据
- 将转换后的数据加载到BigQuery
管道设置如下:
myPCollection = (org.apache.beam.sdk.values.PCollection<myCollectionObjectType>)myInputPCollection
.apply("do a parallel transform"),
ParDo.of(new MyTransformClassName.MyTransformFn()));
myPCollection
.apply("Load BigQuery data for PCollection",
BigQueryIO.<myCollectionObjectType>write()
.to(new MyDataLoadClass.MyFactTableDestination(myDestination))
.withFormatFunction(new MyDataLoadClass.MySerializationFn())
我已经看了这个问题:
这表明我可以按照步骤1中的并行转换,以某种方式动态更改可以将数据传递至的输出.
which suggests that I may be able to somehow dynamically change which output I can pass data to, following the parallel transform in step 1.
我该怎么做?我不知道如何选择是否将步骤1的myPCollection
传递给步骤2.如果步骤1的myPCollection
中的对象是null
,则需要跳过步骤2.
How do I do this? I don't know how to choose whether or not to pass myPCollection
from step 1 to step 2. I need to skip step 2 if the object in myPCollection
from step 1 is null
.
推荐答案
如果您不希望在下一步中使用MyTransformClassName.MyTransformFn
中的元素,就不会发出它,例如,像这样的东西:>
You just don't emit the element from your MyTransformClassName.MyTransformFn
when you don't want it in the next step, for example something like this:
class MyTransformClassName.MyTransformFn extends...
@ProcessElement
public void processElement(ProcessContext c, ...) {
...
result = ...
if (result != null) {
c.output(result); //only output something that's not null
}
}
这样,null不会到达下一步.
This way nulls don't reach the next step.
有关更多详细信息,请参见指南的ParDo
部分: https://beam.apache.org/documentation/programming-guide/#pardo
See the ParDo
section of the guide for more details: https://beam.apache.org/documentation/programming-guide/#pardo
这篇关于Apache Beam-跳过管道步骤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!