KTable vs GlobalKTable和leftJoin()vs outsideJoin()有什么区别? [英] What are the differences between KTable vs GlobalKTable and leftJoin() vs outerJoin()?
问题描述
在Kafka Stream库中,我想知道KTable和GlobalKTable之间的区别.
In Kafka Stream library, I want to know difference between KTable and GlobalKTable.
在KStream类中,还有两个方法leftJoin()
和outerJoin()
.这两种方法也有什么区别?
Also in KStream class, there are two methods leftJoin()
and outerJoin()
. What is the difference between these two methods also?
I read KStream.leftJoin, but did not manage to find an exact difference.
推荐答案
KTable VS GlobalKTable
A KTable
分片所有正在运行的Kafka Streams实例之间的数据,而GlobalKTable
具有每个实例上所有数据的完整副本. GlobalKTable
的缺点是它显然需要更多的内存.好处是,您可以使用流中的非关键属性来进行KStream-GlobalKTable连接.对于KStream-KTable联接,只有通过在执行联接之前提取联接属性并将其设置为键,才能实现联接的非关键流属性-这将导致在联接可以之前对流进行重新分区计算.
KTable VS GlobalKTable
A KTable
shardes the data between all running Kafka Streams instances, while a GlobalKTable
has a full copy of all data on each instance. The disadvantage of GlobalKTable
is that it obviously needs more memory. The advantage is, that you can do a KStream-GlobalKTable join with a non-key attribute from the stream. For a KStream-KTable join and a non-key stream attribute for the join is only possible by extracting the join attribute and set it as the key before doing the join -- this will result in a repartitioning step of the stream before the join can be computed.
但是请注意,在语义上也有区别:对于流表连接,Kafka Stream会根据记录时间戳顺序排列记录处理.因此,表的更新与流的记录对齐.对于GlobalKTable
,没有时间同步,因此更新为GlobalKTable
,并且与流记录的处理完全脱钩(因此,您获得的语义较弱).
Note though, that there is also a semantical difference: For stream-table join, Kafka Stream align record processing ordered based on record timestamps. Thus, the update to the table are aligned with the records of you stream. For GlobalKTable
, there is no time synchronization and thus update to GlobalKTable
and completely decoupled from the processing of the stream records (thus, you get weaker semantics).
For further details, see KIP-99: Add Global Tables to Kafka Streams.
关于左联接和外联接:就像在数据库中分别是左联接和全联接.
About left and outer joins: it's like in a database a left-outer and full-outer join, respectively.
对于左外部连接,如果左侧的连接不匹配,则可以丢失"右侧输入流的数据.
For a left outer join, you might "lose" data of your right input stream in case there is no match for the join in the left-hand side.
对于(完整)外部联接,不会删除任何数据,并且两个流的每个输入记录都将在结果流中.
For a (full)outer join, no data will be dropped and each input record of both streams will be in the result stream.
这篇关于KTable vs GlobalKTable和leftJoin()vs outsideJoin()有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!