流中记录的顺序 [英] Ordering of Records in Stream
问题描述
以下是我的一些查询:
我有两个不同的流stream1
和stream2
,其中的元素按顺序排列.
I have two different streams stream1
and stream2
in which the elements are in order.
1)现在,当我在这些流中的每一个上执行keyBy
时,都将保持顺序? (因为这里的每个小组只会发送给一个任务管理器)
我的理解是,这些记录将为了一个小组,在这里更正我.
1) Now when I do keyBy
on each of these streams, will the order be maintained? (Since every group here will be sent to one task manager only )
My understanding is that the records will be in order for a group, correct me here.
2)在两个流的keyBy
之后,我正在共同分组以获取匹配和不匹配的记录.订单也可以在这里维护吗?因为这在KeyedStream
上也适用.
我正在使用EventTime
和AscendingTimestampExtractor
生成timestamp
和watermark
.
2) After the keyBy
on both of the streams I am doing co-group to get the matching and non-matching records. Will the order be maintained here also?, since this also works on KeyedStream
.
I am using EventTime
, and AscendingTimestampExtractor
for generating timestamp
and watermark
.
3)现在,我想使用map/flatmap对2)中得到的matching_nonMatchingStream
进行序列检查.
我是否需要在此处再次执行keyBy
,或者如果保持连锁状态,matching_nonMatchingStream
是否可以在同一TaskManager
中运行?
我在这里的理解是,链条将在这里起作用,纠正我,变得困惑.
3) Now I want to perform the sequence check on the matching_nonMatchingStream
I get from 2) using map/flatmap.
Do I need to again perform the keyBy
here , or if I keep in chain will the matching_nonMatchingStream
run in same TaskManager
?
My understanding here is that the chain will work here, correct me , getting confused.
4)slotSharingGroup
-请您进一步说明一下
根据文档:设置此操作的插槽共享组.如果可能,在同一插槽共享组中的并行操作实例将位于同一TaskManager
插槽中.
4) slotSharingGroup
- can you please describe more about this
according to the doc : Sets the slot sharing group of this operation. Parallel instances of operations that are in the same slot sharing group will be co-located in the same TaskManager
slot, if possible.
推荐答案
1)是和否. Flink使用所谓的水印以跟踪顺序.这样可以确保可以将记录分配给正确的窗口,并且在所有数据可用之前,不会关闭窗口.但是,不能保证每个组都有严格的顺序(因为并行传入数据).组之间根本没有订购保证.
1) Yes and no. Flink uses so-called Watermarks to track the ordering. This ensures that records can be assigned to the correct windows and windows are not closed until all data is available. However, a strict order is not guaranteed per group (because of parallel incoming data). Between groups, there is no ordering guarantee at all.
2)与(1)的答案基本相同.
2) Basically same answer as for (1).
3)您无需再次使用keyBy
. map
/flatMap
将默认链接.
3) You do not need to use keyBy
again. The map
/flatMap
will be chained by default.
这篇关于流中记录的顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!