流中记录的排序 [英] Ordering of Records in Stream

查看:24
本文介绍了流中记录的排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是我的一些疑问:

我有两个不同的流 stream1stream2,其中的元素是有序的.

I have two different streams stream1 and stream2 in which the elements are in order.

1) 现在,当我对这些流中的每一个执行 keyBy 时,会保持顺序吗?(因为这里的每个组只会被发送到一个任务管理器)我的理解是记录将按顺序排列,请在此处纠正我.

1) Now when I do keyBy on each of these streams, will the order be maintained? (Since every group here will be sent to one task manager only ) My understanding is that the records will be in order for a group, correct me here.

2) 在两个流上的 keyBy 之后,我正在做 co-group 以获取匹配和不匹配的记录.订单也会在这里保持吗?因为这也适用于 KeyedStream.我使用 EventTimeAscendingTimestampExtractor 来生成 timestampwatermark.

2) After the keyBy on both of the streams I am doing co-group to get the matching and non-matching records. Will the order be maintained here also?, since this also works on KeyedStream. I am using EventTime, and AscendingTimestampExtractor for generating timestamp and watermark.

3) 现在我想对从 2)​​ 得到的 matching_nonMatchingStream 使用 map/flatmap 执行序列检查.我是否需要在这里再次执行 keyBy ,或者如果我保持在链中,matching_nonMatchingStream 会在同一个 TaskManager 中运行吗?我的理解是,链条将在这里工作,纠正我,感到困惑.

3) Now I want to perform the sequence check on the matching_nonMatchingStream I get from 2) using map/flatmap. Do I need to again perform the keyBy here , or if I keep in chain will the matching_nonMatchingStream run in same TaskManager? My understanding here is that the chain will work here, correct me , getting confused.

4) slotSharingGroup - 你能详细描述一下吗根据 doc :设置此操作的插槽共享组.如果可能,同一插槽共享组中的并行操作实例将位于同一 TaskManager 插槽中.

4) slotSharingGroup - can you please describe more about this according to the doc : Sets the slot sharing group of this operation. Parallel instances of operations that are in the same slot sharing group will be co-located in the same TaskManager slot, if possible.

推荐答案

1) 是和否.Flink 使用所谓的 水印 以跟踪订购.这确保可以将记录分配给正确的窗口,并且在所有数据都可用之前不会关闭窗口.但是,不能保证每个组的严格顺序(因为 并行传入数据).组之间,根本没有顺序保证.

1) Yes and no. Flink uses so-called Watermarks to track the ordering. This ensures that records can be assigned to the correct windows and windows are not closed until all data is available. However, a strict order is not guaranteed per group (because of parallel incoming data). Between groups, there is no ordering guarantee at all.

2) 与 (1) 的答案基本相同.

2) Basically same answer as for (1).

3) 您不需要再次使用 keyBy.map/flatMap 默认会被链接起来.

3) You do not need to use keyBy again. The map/flatMap will be chained by default.

4) 参见 https://ci.apache.org/projects/flink/flink-docs-release-1.0/internals/general_arch.html#the-processes

这篇关于流中记录的排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆