卡夫卡使用的内部主题是什么? [英] What are internal topics used in Kafka?

查看:75
本文介绍了卡夫卡使用的内部主题是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用kafka流api进行聚合,在其中我们还使用了分组依据. 我们还使用状态存储来保存输入主题数据.

我注意到的是

Kafka内部创建3种主题

  1. Changelog-<storeid>-<partition>
  2. Repartition-<storeid>-<partition>
  3. <topicname>-<partition>

我不明白的是

  1. 当我将所有数据都保存在<topic>-<partition>
  2. 中时,为什么会创建changelog主题
  3. 重新分区主题是否包含分组后的数据.
  4. ,我发现Changelog和topicname-parition的大小大致相同.

数据中有什么不同,因此必须为此保存一个不同的文件.

解决方案

更改日志"和重新分区"内部Kafka主题特定于Kafka Streams.

从Kafka Wiki,

Kafka Streams允许状态流处理,即具有内部状态的运算符.内部状态在所谓的状态存储中进行管理.状态存储可以是短暂的(在失败时丢失)或容错的(在失败后恢复). Kafka Streams DSL使用的默认实现是一个容错状态存储,它使用1.一个内部创建并压缩的changelog主题(用于容错)和2.一个(或多个)RocksDB实例(用于缓存的键值查找).因此,在启动/停止应用程序以及倒带/重新处理的情况下,需要对内部数据进行正确的管理.

当流上有加入/聚合操作时,会创建

Changelog主题.实际上,聚集调用的结果将创建一个状态存储,并且为了容错,该状态存储由Kafka Changelog主题进行备份.

聚合结果存储在此内部主题中.重新启动应用程序且未更改application-id时,状态将从changelog主题中恢复.

在流上进行键修改操作时,会创建

重新分区主题.例如,groupByKey()操作创建重新分区主题.检查 JIRA页面,以了解有关自动创建重新分配主题的更多信息./p>

这两个内部主题使Kafka流具有容错的状态流处理功能.

重新分区主题在分组后是否包含数据?-是

Changelog的大小和topicname-parition的大小大致相同-可能所有聚合操作的结果都存储在此主题中.

有关更多详细信息,请检查 Kafka Wiki页面.

We are using kafka stream api for aggregation in which we are also using group by. We are also using state store where it saves the input topics data.

What i notice is

Kafka internally creates 3 kinds of topic

  1. Changelog-<storeid>-<partition>
  2. Repartition-<storeid>-<partition>
  3. <topicname>-<partition>

What I am not able to understand is

  1. Why it creates changelog topic when I have all the data in <topic>-<partition>
  2. Does repartition topic contains data after grouping.
  3. and I see that the size of Changelog and topicname-parition are approx same.

What is different in the data so that it has to save a different file for that.

解决方案

'Changelog' and 'repartition' internal Kafka topics are specific to Kafka Streams.

From Kafka Wiki,

Kafka Streams allows for stateful stream processing, i.e. operators that have an internal state. This internal state is managed in so-called state stores. A state store can be ephemeral (lost on failure) or fault-tolerant (restored after the failure). The default implementation used by Kafka Streams DSL is a fault-tolerant state store using 1. an internally created and compacted changelog topic (for fault-tolerance) and 2. one (or multiple) RocksDB instances (for cached key-value lookups). Thus, in case of starting/stopping applications and rewinding/reprocessing, this internal data needs to get managed correctly.

Changelog topics are created when there are join/aggregation operations on the stream. Actually the result of aggregation call creates a state store and for fault-tolerance the state store is backed up by a Kafka Changelog topic.

The aggregation results are stored into this internal topic. State will be recovered from changelog topic when applications is restarted and application-id wasn't changed.

Re-partition topics are created when there are key modifying operations on the stream. For example, groupByKey() operation creates repartition topic. Check JIRA page to know more about auto creation of re-parition topic.

These two internal topics enables Kafka streams to have fault-tolerant stateful stream processing capabilities.

Does repartition topic contains data after grouping? - Yes

The size of Changelog and topicname-parition are approx same - Possibly, the result of all aggregation operations are stored in this topic.

For more details, please check Kafka Wiki page.

这篇关于卡夫卡使用的内部主题是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆