在Flink流中使用状态和窗口(时间)之间的差异 [英] Differences between working with states and windows(time) in Flink streaming

查看:240
本文介绍了在Flink流中使用状态和窗口(时间)之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们要计算项目的总和和平均值, 并且可以使用stateswindows(时间).

使用windows的示例- https://ci.apache .org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#example-program

使用states的示例- 解决方案

首先,这取决于您的语义...这两个示例使用不同的语义,因此不能直接进行比较.而且,窗口也在内部使用状态.通常很难说采用方法是更好的方法.

由于Flink的窗口语义非常丰富,我建议使用Windows.如果您无法使用Windows表达语义,那么使用状态可以是一个很好的选择.使用Windows,还有一个额外的优势,那就是状态处理-很难正确完成-会自动为您完成.

该决定绝对独立于您的数据到达率. Flink不会删除任何数据.如果您使用事件时间(em)(而不是处理时间),则结果将是相同的,而与数据到达率无关.

Let's say we want to compute the sum and average of the items, and can either working with states or windows(time).

Example working with windows - https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#example-program

Example working with states - https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/ride_speed/RideSpeed.java

Can I ask what would be the reasons to make decision? Can I infer that if the data arrives very irregularly (50% comes in the defined window length and the other 50% don't), the result of the window approach is more biased (because the 50% events are dropped)?

On the other hand, do we spend more time checking and updating the states when working with states?

解决方案

First, it depends on your semantics... The two examples use different semantics and are thus not comparable directly. Furthermore, windows work with state internally, too. It is hard to say in general with approach is the better one.

As Flink's window semantics are very rich, I would suggest to use windows. If you cannot express your semantics with windows, using state can be a good alternative. Using windows, has the additional advantage that state handling---which is hard to get done right---is done automatically for you.

The decision is definitely independent from your data arrival rate. Flink does not drop any data. If you work with event time (rather than with processing time) your result will be the same independently of the data arrival rate after all.

这篇关于在Flink流中使用状态和窗口(时间)之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆