在Apache Beam中强制流中的空白窗格/窗口 [英] Forcing an empty pane/window in streaming in Apache Beam

查看:102
本文介绍了在Apache Beam中强制流中的空白窗格/窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现管道并接收数据流,如果分钟间隔中有任何元素,则每分钟每分钟输出True,如果没有元素则每分钟输出False.如果在持续时间内没有任何元素,则窗格(带有永久时间触发器)或窗口(固定窗口)似乎不会触发.

I am trying to implement a pipeline and takes in a stream of data and every minutes output a True if there is any element in the minute interval or False if there is none. The pane (with forever time trigger) or window (fixed window) does not seem to trigger if there is no element for the duration.

我正在考虑的一种解决方法是将流放入全局窗口,使用ValueState保留队列以累积数据,并使用计时器作为检查队列的触发器.我想知道是否有更整洁的方法来实现这一目标.

One workaround I am thinking is to put the stream into a global window, use a ValueState to keep a queue to accumulate the data and a timer as a trigger to exam the queue. I wonder if there is any neater way of achieving this.

谢谢.

推荐答案

我认为您的计时器和状态解决方案是实现此目的的好方法.但是,请记住,只有收到至少一个键元素后,您的计时器才会被设置.

I think your timers and state solution is a good way to do this. However, keep in mind that your timers will not be set until you receive at least one element for a key.

如果这是一个问题,那么您可以做的另一件事是注入一个PCollection,以确保每个窗口至少具有一个虚拟元素.然后,您可以使用ValueState检查虚拟元素之外的其他元素是否已到达.或者,在窗口上使用Count.PerElement并检查该窗口是否有1个以上的元素(另一个元素,不是虚拟元素).

If this is an issue, then the other thing you could do is inject a PCollection so that every window is guaranteed to have at least one dummy element. Then you can use ValueState to check if any element besides the dummy element has arrived. Or alternatively use Count.PerElement over the window and check if there is more than 1 element(An additional element, that is not the dummy element) for that window.

这篇关于在Apache Beam中强制流中的空白窗格/窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆