数据流作业完成后通知Google PubSub [英] Notifying Google PubSub when Dataflow job is complete
问题描述
在Google Dataflow作业完成之后,是否可以将消息发布到Google Pubsub?我们需要通知从属系统传入数据的处理已完成.将数据写入接收器后,Dataflow如何发布?
Is there a way to publish a message onto Google Pubsub after a Google Dataflow job completes? We have a need to notify dependent systems that the processing of incoming data is complete. How could Dataflow publish after writing data to the sink?
我们想在管道完成对GCS的写入后通知.我们的管道如下所示:
We want to notify after a pipeline completes writing to GCS. Our pipeline looks like this:
Pipeline.create(options)
.apply(....)
.apply(AvroIO.Write.named("Write to GCS")
.withSchema(Extract.class)
.to(options.getOutputPath())
.withSuffix(".avro"));
p.run();
如果我们在pipeline.apply(...)方法之外添加逻辑,则在代码完成执行时(而不是在管道完成时)会收到通知.理想情况下,我们可以在AvroIO接收器之后添加另一个.apply(...)
并将消息发布到PubSub.
If we add logic outside of the pipeline.apply(...) methods we are notified when the code completes execution, not when the pipeline is completed. Ideally we could add another .apply(...)
after the AvroIO sink and publish a message to PubSub.
推荐答案
当管道完成时,您有两个选择可得到通知,然后随后发布消息-或在管道完成运行后执行任何操作:>
You have two options to get notified when your pipeline finishes, and then subsequently publish a message - or do whatever you want to after the pipeline finishes running:
- Use the
BlockingPipelineRunner
. This will run your pipeline synchronously. - Use the
DataflowPipelineRunner
. This will run your pipeline asynchronously. You can then poll the pipeline for its status, and wait for it to finish.
这篇关于数据流作业完成后通知Google PubSub的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!