在数据流管道处理完所有数据后执行操作 [英] Perform action after Dataflow pipeline has processed all data

查看:76
本文介绍了在数据流管道处理完所有数据后执行操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一旦批处理Dataflow作业完成了所有数据的处理,是否可以执行操作?具体来说,我想将管道刚处理的文本文件移到另一个GCS存储桶中.我不确定将其放置在管道中的位置以确保它在数据处理完成后执行一次.

Is it possible to perform an action once a batch Dataflow job has finished processing all data? Specifically, I'd like to move the text file that the pipeline just processed to a different GCS bucket. I'm not sure where to place that in my pipeline to ensure it executes once after the data processing has completed.

推荐答案

我不明白为什么您需要在管道执行后执行此操作.您可以使用侧面输出将文件写入多个存储桶,并在管道完成后保存副本.

I don't see why you need to do this post pipeline execution. You could use side outputs to write the file to multiple buckets, and save yourself the copy after the pipeline finishes.

如果由于某种原因这对您不起作用,那么您只需在阻塞执行模式,即使用pipeline.run().waitUntilFinish(),然后在此之后编写其余代码(执行复制操作).

If that's not going to work for you (for whatever reason), then you can simply run your pipeline in blocking execution mode i.e. use pipeline.run().waitUntilFinish(), and then just write the rest of your code (which does the copy) after that.

[..]
// do some stuff before the pipeline runs
Pipeline pipeline = ...
pipeline.run().waitUntilFinish();
// do something after the pipeline finishes here
[..]

这篇关于在数据流管道处理完所有数据后执行操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆