流中记录的顺序 [英] Ordering of Records in Stream

查看:58
本文介绍了流中记录的顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是我的一些查询:

我有两个不同的流stream1stream2,其中的元素按顺序排列.

I have two different streams stream1 and stream2 in which the elements are in order.

1)现在,当我在这些流中的每一个上执行keyBy时,都将保持顺序? (因为这里的每个小组只会发送给一个任务管理器) 我的理解是,这些记录将为了一个小组,在这里更正我.

1) Now when I do keyBy on each of these streams, will the order be maintained? (Since every group here will be sent to one task manager only ) My understanding is that the records will be in order for a group, correct me here.

2)在两个流的keyBy之后,我正在共同分组以获取匹配和不匹配的记录.订单也可以在这里维护吗?因为这在KeyedStream上也适用. 我正在使用EventTimeAscendingTimestampExtractor生成timestampwatermark.

2) After the keyBy on both of the streams I am doing co-group to get the matching and non-matching records. Will the order be maintained here also?, since this also works on KeyedStream. I am using EventTime, and AscendingTimestampExtractor for generating timestamp and watermark.

3)现在,我想使用map/flatmap对2)中得到的matching_nonMatchingStream进行序列检查. 我是否需要在此处再次执行keyBy,或者如果保持连锁状态,matching_nonMatchingStream是否可以在同一TaskManager中运行? 我在这里的理解是,链条将在这里起作用,纠正我,变得困惑.

3) Now I want to perform the sequence check on the matching_nonMatchingStream I get from 2) using map/flatmap. Do I need to again perform the keyBy here , or if I keep in chain will the matching_nonMatchingStream run in same TaskManager? My understanding here is that the chain will work here, correct me , getting confused.

4)slotSharingGroup-请您进一步说明一下 根据文档:设置此操作的插槽共享组.如果可能,在同一插槽共享组中的并行操作实例将位于同一TaskManager插槽中.

4) slotSharingGroup - can you please describe more about this according to the doc : Sets the slot sharing group of this operation. Parallel instances of operations that are in the same slot sharing group will be co-located in the same TaskManager slot, if possible.

推荐答案

1)是和否. Flink使用所谓的水印以跟踪顺序.这样可以确保可以将记录分配给正确的窗口,并且在所有数据可用之前,不会关闭窗口.但是,不能保证每个组都有严格的顺序(因为并行传入数据).组之间根本没有订购保证.

1) Yes and no. Flink uses so-called Watermarks to track the ordering. This ensures that records can be assigned to the correct windows and windows are not closed until all data is available. However, a strict order is not guaranteed per group (because of parallel incoming data). Between groups, there is no ordering guarantee at all.

2)与(1)的答案基本相同.

2) Basically same answer as for (1).

3)您无需再次使用keyBy. map/flatMap将默认链接.

3) You do not need to use keyBy again. The map/flatMap will be chained by default.

4)参见 https ://ci.apache.org/projects/flink/flink-docs-release-1.0/internals/general_arch.html#the-processes

这篇关于流中记录的顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆