KTable 与 GlobalKTable 以及 leftJoin() 与 outerJoin() 之间有什么区别? [英] What are the differences between KTable vs GlobalKTable and leftJoin() vs outerJoin()?

查看:41
本文介绍了KTable 与 GlobalKTable 以及 leftJoin() 与 outerJoin() 之间有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Kafka Stream 库中,我想知道 KTable 和 GlobalKTable 之间的区别.

In Kafka Stream library, I want to know difference between KTable and GlobalKTable.

同样在KStream类中,有两个方法leftJoin()outerJoin().这两种方法也有什么区别?

Also in KStream class, there are two methods leftJoin() and outerJoin(). What is the difference between these two methods also?

我阅读了KStream.leftJoin,但没有设法找到完全不同.

I read KStream.leftJoin, but did not manage to find an exact difference.

推荐答案

KTable VS GlobalKTable

KTable 在所有正在运行的 Kafka Streams 实例之间对数据进行分片,而 GlobalKTable 具有每个实例上所有数据的完整副本.GlobalKTable 的缺点是它显然需要更多的内存.优点是,您可以使用流中的非键属性进行 KStream-GlobalKTable 连接.对于 KStream-KTable 连接和连接的非关键流属性,只能通过在连接之前提取连接属性并将其设置为键来实现——这将导致在连接之前对流进行重新分区步骤被计算.

KTable VS GlobalKTable

A KTable shardes the data between all running Kafka Streams instances, while a GlobalKTable has a full copy of all data on each instance. The disadvantage of GlobalKTable is that it obviously needs more memory. The advantage is, that you can do a KStream-GlobalKTable join with a non-key attribute from the stream. For a KStream-KTable join and a non-key stream attribute for the join is only possible by extracting the join attribute and set it as the key before doing the join -- this will result in a repartitioning step of the stream before the join can be computed.

但请注意,还有语义上的差异:对于流表连接,Kafka Stream 对齐基于记录时间戳排序的记录处理.因此,对表的更新与您的流记录对齐.对于GlobalKTable,没有时间同步,因此更新到GlobalKTable,并与流记录的处理完全解耦(因此,你得到较弱的语义).

Note though, that there is also a semantical difference: For stream-table join, Kafka Stream align record processing ordered based on record timestamps. Thus, the update to the table are aligned with the records of you stream. For GlobalKTable, there is no time synchronization and thus update to GlobalKTable and completely decoupled from the processing of the stream records (thus, you get weaker semantics).

更多详情,请参见KIP-99:将全局表添加到 Kafka Streams.

关于左连接和外连接:就像在数据库中分别有左外连接和全外连接一样.

About left and outer joins: it's like in a database a left-outer and full-outer join, respectively.

对于左外连接,如果左侧的连接没有匹配项,您可能会丢失"右输入流的数据.

For a left outer join, you might "lose" data of your right input stream in case there is no match for the join in the left-hand side.

对于(完全)外部联接,不会丢弃任何数据,并且两个流的每个输入记录都将在结果流中.

For a (full)outer join, no data will be dropped and each input record of both streams will be in the result stream.

这篇关于KTable 与 GlobalKTable 以及 leftJoin() 与 outerJoin() 之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆