Flink 时间特性和 AutoWatermarkInterval [英] Flink Time Characteristic and AutoWatermarkInterval

查看:169
本文介绍了Flink 时间特性和 AutoWatermarkInterval的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Apache Flink 中,setAutoWatermarkInterval(interval) 为下游操作员生成水印,以便他们提前他们的事件时间.

In Apache Flink, setAutoWatermarkInterval(interval) produces watermarks to downstream operators so that they advance their event time.

如果水印在指定的时间间隔内没有改变(没有事件到达),运行时将不会发出任何水印?另一方面,如果新事件在下一个时间间隔之前到达,则新的水印将立即发出或排队/等待,直到达到下一个 setAutoWatermarkInterval 时间间隔.

If the watermark has not been changed during the specified interval (no events arrived) the runtime will not emit any watermarks? On the other hand, if a new event is arrived before the next interval, a new watermark will be immediately emitted or it will be queued/waiting until the next setAutoWatermarkInterval interval is reached.

我很好奇什么是最佳配置 AutoWatermarkInterval(特别是对于高速率源):这个值越小,处理时间和事件时间之间的滞后就越小,但在使用更多 BW 来发送水印的开销.真的准确吗?

I am curious on what is the best configuration AutoWatermarkInterval (especially for high rate sources): the more this value is small, the more lag between processing time and event time will be small, but at the overhead of more BW usage to send the watermarks. Is that true accurate?

另一方面,如果我使用 env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime),Flink 运行时会自动分配时间戳和水印(时间戳对应于事件进入 Flink 数据流管道的时间,即源操作符),尽管如此ingestionTime 我们仍然可以定义一个处理时间计时器(在 processElement 函数中),如下所示:

On the other hand, If I used env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime), Flink runtime will automatically assign timestamps and watermarks (timestamps correspond to the time the event entered the Flink dataflow pipeline i.e. the source operator), nevertheless even with ingestionTime we can still define a processing time timer (in the processElement function) as show below:

long timer = context.timestamp() + Timeout.
context.timerService().registerProcessingTimeTimer(timer);

其中 context.timestamp() 是 Fl​​ink 设置的摄取时间.

where context.timestamp() is the ingestion time set by Flink.

谢谢.

推荐答案

autoWatermarkInterval 只影响关注它的水印生成器.他们还有机会结合事件处理生成水印.

The autoWatermarkInterval only affects watermark generators that pay attention to it. They also have an opportunity to generate a watermark in combination with event processing.

对于那些使用 autoWatermarkInterval(这绝对是正常情况)的水印生成器,他们正在收集下一个水印应该是什么的证据,作为为每个事件分配时间戳的副作用.当计时器触发时(基于 autoWatermarkInterval),Flink 运行时会要求水印生成器生成下一个水印.水印不是在某处等待,也不是排队,而是根据时间戳分配器存储的信息按需创建 - 这通常是目前在流中看到的最大时间戳.

For those watermark generators that use the autoWatermarkInterval (which is definitely the normal case), they are collecting evidence for what the next watermark should be as a side effect of assigning timestamps for each event. When a timer fires (based on the autoWatermarkInterval), the watermark generator is then asked by the Flink runtime to produce the next watermark. The watermark wasn't waiting somewhere, nor was it queued, but rather it is created on demand, based on information that had been stored by the timestamp assigner -- which is typically the maximum timestamp seen so far in the stream.

是的,更频繁的水印意味着更多的通信和处理开销,以及更低的延迟.您必须根据应用程序的要求决定如何处理这种吞吐量/延迟的权衡.

Yes, more frequent watermarks means more overhead to communicate and process them, and lower latency. You have to decide how to handle this throughput/latency tradeoff based on your application's requirements.

无论 TimeCharacteristic 如何,您始终可以使用处理时间计时器.(顺便说一句,在低级别,水印唯一能做的就是触发事件时间计时器,无论是在进程函数、窗口等中)

You can always use processing time timers, regardless of the TimeCharacteristic. (By the way, at a low level, the only thing watermarks do is to trigger event time timers, be they in process functions, windows, etc.)

这篇关于Flink 时间特性和 AutoWatermarkInterval的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆