KTable 与 GlobalKTable 以及 leftJoin() 与 outerJoin() 之间有什么区别? [英] What are the differences between KTable vs GlobalKTable and leftJoin() vs outerJoin()?
问题描述
在 Kafka Stream 库中,我想知道 KTable 和 GlobalKTable 之间的区别.
In Kafka Stream library, I want to know difference between KTable and GlobalKTable.
同样在KStream类中,有两个方法leftJoin()
和outerJoin()
.这两种方法也有什么区别?
Also in KStream class, there are two methods leftJoin()
and outerJoin()
. What is the difference between these two methods also?
我阅读了KStream.leftJoin,但没有设法找到完全不同.
I read KStream.leftJoin, but did not manage to find an exact difference.
推荐答案
KTable VS GlobalKTable
KTable
在所有正在运行的 Kafka Streams 实例之间对数据进行分片,而 GlobalKTable
具有每个实例上所有数据的完整副本.GlobalKTable
的缺点是它显然需要更多的内存.优点是,您可以使用流中的非键属性进行 KStream-GlobalKTable 连接.对于 KStream-KTable 连接和连接的非关键流属性,只能通过在连接之前提取连接属性并将其设置为键来实现——这将导致在连接之前对流进行重新分区步骤被计算.
KTable VS GlobalKTable
A KTable
shardes the data between all running Kafka Streams instances, while a GlobalKTable
has a full copy of all data on each instance. The disadvantage of GlobalKTable
is that it obviously needs more memory. The advantage is, that you can do a KStream-GlobalKTable join with a non-key attribute from the stream. For a KStream-KTable join and a non-key stream attribute for the join is only possible by extracting the join attribute and set it as the key before doing the join -- this will result in a repartitioning step of the stream before the join can be computed.
但请注意,还有语义上的差异:对于流表连接,Kafka Stream 对齐基于记录时间戳排序的记录处理.因此,对表的更新与您的流记录对齐.对于GlobalKTable
,没有时间同步,因此更新到GlobalKTable
,并与流记录的处理完全解耦(因此,你得到较弱的语义).
Note though, that there is also a semantical difference: For stream-table join, Kafka Stream align record processing ordered based on record timestamps. Thus, the update to the table are aligned with the records of you stream. For GlobalKTable
, there is no time synchronization and thus update to GlobalKTable
and completely decoupled from the processing of the stream records (thus, you get weaker semantics).
更多详情,请参见KIP-99:将全局表添加到 Kafka Streams.
关于左连接和外连接:就像在数据库中分别有左外连接和全外连接一样.
About left and outer joins: it's like in a database a left-outer and full-outer join, respectively.
对于左外连接,如果左侧的连接没有匹配项,您可能会丢失"右输入流的数据.
For a left outer join, you might "lose" data of your right input stream in case there is no match for the join in the left-hand side.
对于(完全)外部联接,不会丢弃任何数据,并且两个流的每个输入记录都将在结果流中.
For a (full)outer join, no data will be dropped and each input record of both streams will be in the result stream.
这篇关于KTable 与 GlobalKTable 以及 leftJoin() 与 outerJoin() 之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!