无论如何,是否要轮询正在运行的数据流管道的系统水印? [英] Is there anyway to poll the system watermark of a running data flow pipeline?

查看:64
本文介绍了无论如何,是否要轮询正在运行的数据流管道的系统水印?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

全部在标题中.我想从我的流媒体作业的顶部开始批量运行,并且能够看到水印作为开始时间的指示器将是很棒的.

It's all in the title. I'd like to run batches off the top of my streaming jobs, and being able to see the watermark as an indicator of when to start would be wonderful.

推荐答案

您也许可以通过使用pubsub发布一个信号来触发您想要的外部处理来实现此目的.

You might be able to accomplish this by using pubsub to publish a signal that would trigger what ever external processing you want.

要控制该信号的频率,您可以使用ParDo根据可能考虑到事件时间戳的某些标准过滤记录.

To control the frequency of that signal you could use a ParDo to filter down your records based on some criterion which might take into account the timestamps of the event.

如果您明确要使用水印,则可以尝试使用窗口并在水印经过某个时间间隔后触发以产生记录.

If you explicitly want to use the watermark you could try to use windowing and triggers to produce records after the watermark passes some interval.

我认为没有任何明确的方法可以访问水印.

I don't think there is any explicit way to access the watermark.

这篇关于无论如何,是否要轮询正在运行的数据流管道的系统水印?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆