数据流作业完成后通知Google PubSub [英] Notifying Google PubSub when Dataflow job is complete

查看:83
本文介绍了数据流作业完成后通知Google PubSub的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Google Dataflow作业完成之后,是否可以将消息发布到Google Pubsub?我们需要通知从属系统传入数据的处理已完成.将数据写入接收器后,Dataflow如何发布?

Is there a way to publish a message onto Google Pubsub after a Google Dataflow job completes? We have a need to notify dependent systems that the processing of incoming data is complete. How could Dataflow publish after writing data to the sink?

我们想在管道完成对GCS的写入后通知.我们的管道如下所示:

We want to notify after a pipeline completes writing to GCS. Our pipeline looks like this:

 
Pipeline.create(options)
                .apply(....)
                .apply(AvroIO.Write.named("Write to GCS")
                             .withSchema(Extract.class)
                             .to(options.getOutputPath())
                             .withSuffix(".avro"));
p.run();

如果我们在pipeline.apply(...)方法之外添加逻辑,则在代码完成执行时(而不是在管道完成时)会收到通知.理想情况下,我们可以在AvroIO接收器之后添加另一个.apply(...)并将消息发布到PubSub.

If we add logic outside of the pipeline.apply(...) methods we are notified when the code completes execution, not when the pipeline is completed. Ideally we could add another .apply(...) after the AvroIO sink and publish a message to PubSub.

推荐答案

当管道完成时,您有两个选择可得到通知,然后随后发布消息-或在管道完成运行后执行任何操作:

You have two options to get notified when your pipeline finishes, and then subsequently publish a message - or do whatever you want to after the pipeline finishes running:

  1. 使用BlockingPipelineRunner.这将同步运行.
  2. 使用DataflowPipelineRunner.这将异步运行管道.然后,您可以轮询管道的状态,并等待其完成.
  1. Use the BlockingPipelineRunner. This will run your pipeline synchronously.
  2. Use the DataflowPipelineRunner. This will run your pipeline asynchronously. You can then poll the pipeline for its status, and wait for it to finish.

这篇关于数据流作业完成后通知Google PubSub的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆