流中记录的排序 [英] Ordering of Records in Stream
问题描述
以下是我的一些疑问:
我有两个不同的流 stream1
和 stream2
,其中的元素是有序的.
I have two different streams stream1
and stream2
in which the elements are in order.
1) 现在,当我对这些流中的每一个执行 keyBy
时,会保持顺序吗?(因为这里的每个组只会被发送到一个任务管理器)我的理解是记录将按顺序排列,请在此处纠正我.
1) Now when I do keyBy
on each of these streams, will the order be maintained? (Since every group here will be sent to one task manager only )
My understanding is that the records will be in order for a group, correct me here.
2) 在两个流上的 keyBy
之后,我正在做 co-group 以获取匹配和不匹配的记录.订单也会在这里保持吗?因为这也适用于 KeyedStream
.我使用 EventTime
和 AscendingTimestampExtractor
来生成 timestamp
和 watermark
.
2) After the keyBy
on both of the streams I am doing co-group to get the matching and non-matching records. Will the order be maintained here also?, since this also works on KeyedStream
.
I am using EventTime
, and AscendingTimestampExtractor
for generating timestamp
and watermark
.
3) 现在我想对从 2) 得到的 matching_nonMatchingStream
使用 map/flatmap 执行序列检查.我是否需要在这里再次执行 keyBy
,或者如果我保持在链中,matching_nonMatchingStream
会在同一个 TaskManager
中运行吗?我的理解是,链条将在这里工作,纠正我,感到困惑.
3) Now I want to perform the sequence check on the matching_nonMatchingStream
I get from 2) using map/flatmap.
Do I need to again perform the keyBy
here , or if I keep in chain will the matching_nonMatchingStream
run in same TaskManager
?
My understanding here is that the chain will work here, correct me , getting confused.
4) slotSharingGroup
- 你能详细描述一下吗根据 doc :设置此操作的插槽共享组.如果可能,同一插槽共享组中的并行操作实例将位于同一 TaskManager
插槽中.
4) slotSharingGroup
- can you please describe more about this
according to the doc : Sets the slot sharing group of this operation. Parallel instances of operations that are in the same slot sharing group will be co-located in the same TaskManager
slot, if possible.
推荐答案
1) 是和否.Flink 使用所谓的 水印 以跟踪订购.这确保可以将记录分配给正确的窗口,并且在所有数据都可用之前不会关闭窗口.但是,不能保证每个组的严格顺序(因为 并行传入数据).组之间,根本没有顺序保证.
1) Yes and no. Flink uses so-called Watermarks to track the ordering. This ensures that records can be assigned to the correct windows and windows are not closed until all data is available. However, a strict order is not guaranteed per group (because of parallel incoming data). Between groups, there is no ordering guarantee at all.
2) 与 (1) 的答案基本相同.
2) Basically same answer as for (1).
3) 您不需要再次使用 keyBy
.map
/flatMap
默认会被链接起来.
3) You do not need to use keyBy
again. The map
/flatMap
will be chained by default.
这篇关于流中记录的排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!