使用广播状态强制使用伪造消息关闭窗口 [英] Using Broadcast State To Force Window Closure Using Fake Messages

查看:44
本文介绍了使用广播状态强制使用伪造消息关闭窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说明:

当前,我正在将Flink与IOT设置一起使用.从本质上讲,设备正在发送数据(例如device_id,device_type,event_timestamp等),而我对发送消息的时间没有任何控制权.然后,我通过device_id和device_type来键入流以进行聚合.我想使用事件时间,因为它可以确保设置的计时器在发生故障时以确定性方式触发.但是,由于这并不总是高吞吐量流,因此可以在10分钟的聚合期间内打开一个窗口,但是直到大约40分钟后才打开下一个窗口.尽管计算将最终完成,但是它将非常晚地输出我想要的结果.

Currently I am working on using Flink with an IOT setup. Essentially, devices are sending data such as (device_id, device_type, event_timestamp, etc) and I don't have any control over when the messages get sent. I then key the steam by device_id and device_type to preform aggregations. I would like to use event-time given that is ensures the timers which are set trigger in a deterministic nature given a failure. However, given that this isn't always a high throughput stream a window could be opened for a 10 minute aggregation period, but not have its next point come until approximately 40 minutes later. Although the calculation would aggregation would eventually be completed it would output my desired result extremely late.

因此,为此,我的解决方法是创建一个额外的外部源,除了发送虚假消息外,该源不执行其他操作.通过按照我的10分钟汇总周期抽出这些虚假消息,即使设备没有发送任何数据,事件时间窗口也将具有迫使窗口关闭的功能.这里的关键部分是使所有并行实例/操作员都可以访问此伪造的消息,因为我需要用此伪造的消息关闭所有窗口.我认为广播状态可能是实现此目标的最合适方法:提供规则,模式或其他配置消息的服务器."报价来源

So my work around for this is to create an additional external source that does nothing other than pump fake messages. By having these fake messages being pumped out in alignment with my 10 minute aggregation period, even if a device hadn't sent any data, the event time windows would have something to force the windows closed. The critical part here is to make it possible that all parallel instances / operators have access to this fake message because I need to close all the windows with this single fake message. I was thinking that Broadcast state might be the most appropriate way to accomplish this goal given: "Broadcast state is replicated across all parallel instances of a function, and might typically be used where you have two streams, a regular data stream alongside a control stream that serves rules, patterns, or other configuration messages." Quote Source

问题:

  1. 广播状态是否是确保所有并行实例(例如Windows)接收到我的虚假消息的最佳方法?
  2. 一旦操作员可以通过广播状态访问此伪造的消息,那么可以使用该伪造的消息来提前事件时间水印吗?

推荐答案

您可以按照您建议的方式使用广播状态进行此操作,但是我不认为这是最佳解决方案.

You can make this work with broadcast state, along the lines you propose, but I'm not convinced it's the best solution.

在理想情况下,我建议您安排设备发送偶尔的Keepalive消息,但是假设这不可能,我认为自定义触发器将在这里很好地工作.您可以扩展EventTimeTrigger,以便除了通过创建事件时间计时器之外,还可以通过

In an ideal world I'd suggest you arrange for the devices to send occasional keepalive messages, but assuming that's not possible, I think a custom Trigger would work well here. You can extend the EventTimeTrigger so that in addition to the event time timer it creates via

ctx.registerEventTimeTimer(window.maxTimestamp());

您还创建了一个处理时间计时器作为后备,如果该处理时间计时器启动时,如果该窗口仍然存在,则会触发该窗口.

you also create a processing time timer, as a fallback, and you FIRE the window if the window still exists when that processing time timer fires.

我推荐这种方法,因为它更简单,更直接地解决了特定需求.使用广播状态方法,您将必须为这些消息引入源,添加广播状态描述符和流,为非广播流添加特殊的伪造水印(设置为Watermark.MAX_WATERMARK),连接广播和非广播流并实现BroadcastProcessFunction(可能实际上并没有做任何事情)等.很多活动部件分布在几个不同的运算符上.

I'm recommending this approach because it's simpler and more directly addresses the specific need. With the broadcast state approach you'll have to introduce a source for these messages, add a broadcast state descriptor and stream, add special fake watermarks for the non-broadcast stream (set to Watermark.MAX_WATERMARK), connect the broadcast and non-broadcast streams and implement a BroadcastProcessFunction (that probably doesn't really do anything), etc. It's a lot of moving parts spread across several different operators.

这篇关于使用广播状态强制使用伪造消息关闭窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆