Apache Flink:即使没有输入记录到达给定聚合窗口,也基于键控状态在 Flink 中发出输出记录 [英] Apache Flink: Emit output records in Flink based on keyed state even if no input records have arrived for a given aggregation window

查看:31
本文介绍了Apache Flink:即使没有输入记录到达给定聚合窗口,也基于键控状态在 Flink 中发出输出记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将 Apache Flink 用于 IoT 应用程序.我有一堆设备可以处于几种状态中的一种.当设备更改状态时,它会发出一条消息,其中包含事件时间戳及其更改的状态.对于一台设备,这可能如下所示:

I am trying to use Apache Flink for an IoT application. I have a bunch of devices that can be in one of several states. When a device changes state, it emits a message that includes an event timestamp and the state it changed to. For one device, this might look like this:

{Device_id: 1, Event_Timestamp: 9:01, State: STATE_1}

{Device_id: 1, Event_Timestamp: 9:01, State: STATE_1}

{Device_id: 1, Event_Timestamp: 9:03, State: STATE_2}

{Device_id: 1, Event_Timestamp: 9:03, State: STATE_2}

对于每个设备,我需要为设备在给定的五分钟窗口内在每个状态中花费的时间生成一个五分钟的聚合.为了做到这一点,我计划使用键控状态来存储每个设备的最后状态更新,以便我知道设备在聚合窗口开始时处于什么状态.例如,假设 ID 为1"的设备有一个键控状态值,表示它在 8:58 进入STATE_2".然后 9:00 - 9:05 窗口的聚合输出会像这样(基于上面的两个示例事件):

For each device, I need to produce a five minute aggregate for the amount of time the device spent in each state for the given five minute window. In order to do this, I plan to use the keyed state to store the last state update for each device, so that I know what state the device was in for the beginning of the aggregation window. For example, assume the device with id "1" has a keyed state value that said it entered "STATE_2" at 8:58. Then the output of the aggregation for the 9:00 - 9:05 window would like like this (based on the two example events above):

{Device_id:1,时间戳:9:00,状态:STATE_1,持续时间:120 秒}

{Device_id: 1, Timestamp: 9:00, State: STATE_1, Duration: 120 seconds}

{Device_id:1,时间戳:9:00,状态:STATE_2,持续时间:180 秒}

{Device_id: 1, Timestamp: 9:00, State: STATE_2, Duration: 180 seconds}

我的问题是:如果窗口有事件,Flink 只会为给定的 device_id 打开一个窗口.这意味着如果设备超过 5 分钟没有改变状态,则没有记录进入流,因此窗口不会打开.但是,我需要发出一条记录,表明设备在当前状态下花费了整整五分钟,这取决于键控状态中存储的内容.例如,Flink 应该发出 9:05-9:10 的记录,表示 ID 为1"的设备在STATE_2"中花费了所有 300 秒.

My problem is this: Flink will only open a window for a given device_id if there is an event for the window. This means that if a device does not change state for over 5 minutes, no record will enter the stream, so the window will not open. However, I need to emit a record that says that the device spent the entire five minutes in whatever the current state is based on what is stored in the keyed state. For example, Flink should emit a record for 9:05-9:10 that says the device with id "1" spent all 300 seconds in "STATE_2".

有没有办法输出每个设备在五分钟聚合窗口内在给定状态下花费的时间量的记录,即使状态在这五分钟内没有改变,因此设备不发送事件?如果没有,是否有任何解决方法可以用来获取应用程序所需的输出事件?

Is there a way to output records for the amount of time each device spent in a given state for a five minute aggregation window EVEN IF the state does not change within those five minutes, and, thus, the device sends no events? If not, are there any workarounds I can use to get the output events I need for my application?

推荐答案

实现这一点的一种直接方法是使用 ProcessFunction 而不是窗口化.您可以保留任何适合您的应用程序的键控状态,并使用计时器来触发生成定期报告.

A straightforward way to implement this would be to use a ProcessFunction rather than windowing. You can keep whatever keyed state is convenient for your application, and use timers to trigger producing the periodic reports.

这篇关于Apache Flink:即使没有输入记录到达给定聚合窗口,也基于键控状态在 Flink 中发出输出记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆