KTable vs GlobalKTable和leftJoin()vs outsideJoin()有什么区别? [英] What are the differences between KTable vs GlobalKTable and leftJoin() vs outerJoin()?

查看:147
本文介绍了KTable vs GlobalKTable和leftJoin()vs outsideJoin()有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Kafka Stream库中,我想知道KTable和GlobalKTable之间的区别.

In Kafka Stream library, I want to know difference between KTable and GlobalKTable.

在KStream类中,还有两个方法leftJoin()outerJoin().这两种方法也有什么区别?

Also in KStream class, there are two methods leftJoin() and outerJoin(). What is the difference between these two methods also?

我阅读了

I read KStream.leftJoin, but did not manage to find an exact difference.

推荐答案

KTable VS GlobalKTable

A KTable分片所有正在运行的Kafka Streams实例之间的数据,而GlobalKTable具有每个实例上所有数据的完整副本. GlobalKTable的缺点是它显然需要更多的内存.好处是,您可以使用流中的非关键属性来进行KStream-GlobalKTable连接.对于KStream-KTable联接,只有通过在执行联接之前提取联接属性并将其设置为键,才能实现联接的非关键流属性-这将导致在联接可以之前对流进行重新分区计算.

KTable VS GlobalKTable

A KTable shardes the data between all running Kafka Streams instances, while a GlobalKTable has a full copy of all data on each instance. The disadvantage of GlobalKTable is that it obviously needs more memory. The advantage is, that you can do a KStream-GlobalKTable join with a non-key attribute from the stream. For a KStream-KTable join and a non-key stream attribute for the join is only possible by extracting the join attribute and set it as the key before doing the join -- this will result in a repartitioning step of the stream before the join can be computed.

但是请注意,在语义上也有区别:对于流表连接,Kafka Stream会根据记录时间戳顺序排列记录处理.因此,表的更新与流的记录对齐.对于GlobalKTable,没有时间同步,因此更新为GlobalKTable,并且与流记录的处理完全脱钩(因此,您获得的语义较弱).

Note though, that there is also a semantical difference: For stream-table join, Kafka Stream align record processing ordered based on record timestamps. Thus, the update to the table are aligned with the records of you stream. For GlobalKTable, there is no time synchronization and thus update to GlobalKTable and completely decoupled from the processing of the stream records (thus, you get weaker semantics).

有关更多详细信息,请参见

For further details, see KIP-99: Add Global Tables to Kafka Streams.

关于左联接和外联接:就像在数据库中分别是左联接和全联接.

About left and outer joins: it's like in a database a left-outer and full-outer join, respectively.

对于左外部连接,如果左侧的连接不匹配,则可以丢失"右侧输入流的数据.

For a left outer join, you might "lose" data of your right input stream in case there is no match for the join in the left-hand side.

对于(完整)外部联接,不会删除任何数据,并且两个流的每个输入记录都将在结果流中.

For a (full)outer join, no data will be dropped and each input record of both streams will be in the result stream.

这篇关于KTable vs GlobalKTable和leftJoin()vs outsideJoin()有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆