在 Apache Beam 中的流式传输中强制使用空窗格/窗口 [英] Forcing an empty pane/window in streaming in Apache Beam

查看:21
本文介绍了在 Apache Beam 中的流式传输中强制使用空窗格/窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现一个管道并接收数据流,如果分钟间隔中有任何元素,则每分钟输出一个 True,如果没有,则输出 False.如果持续时间内没有元素,则窗格(具有永久时间触发器)或窗口(固定窗口)似乎不会触发.

I am trying to implement a pipeline and takes in a stream of data and every minutes output a True if there is any element in the minute interval or False if there is none. The pane (with forever time trigger) or window (fixed window) does not seem to trigger if there is no element for the duration.

我正在考虑的一种解决方法是将流放入全局窗口中,使用 ValueState 来保持队列来累积数据,并使用计时器作为触发器来检查队列.我想知道是否有任何更简洁的方法来实现这一点.

One workaround I am thinking is to put the stream into a global window, use a ValueState to keep a queue to accumulate the data and a timer as a trigger to exam the queue. I wonder if there is any neater way of achieving this.

谢谢.

推荐答案

我认为您的计时器和状态解决方案是实现此目的的好方法.但是,请记住,在您收到至少一个密钥元素之前,不会设置您的计时器.

I think your timers and state solution is a good way to do this. However, keep in mind that your timers will not be set until you receive at least one element for a key.

如果这是一个问题,那么您可以做的另一件事是注入 PCollection,以便保证每个窗口至少有一个虚拟元素.然后你可以使用 ValueState 检查除了虚拟元素之外的任何元素是否已经到达.或者,在窗口上使用 Count.PerElement 并检查该窗口是否有超过 1 个元素(附加元素,不是虚拟元素).

If this is an issue, then the other thing you could do is inject a PCollection so that every window is guaranteed to have at least one dummy element. Then you can use ValueState to check if any element besides the dummy element has arrived. Or alternatively use Count.PerElement over the window and check if there is more than 1 element(An additional element, that is not the dummy element) for that window.

这篇关于在 Apache Beam 中的流式传输中强制使用空窗格/窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆