Kafka 使用了哪些内部主题? [英] What are internal topics used in Kafka?

查看:57
本文介绍了Kafka 使用了哪些内部主题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们使用 kafka 流 api 进行聚合,其中我们也使用 group by.我们还使用状态存储来保存输入主题数据.

我注意到的是

Kafka内部创建了3种topic

  1. Changelog--
  2. 重新分区--
  3. -

我无法理解的是

  1. 为什么当我拥有 -
  2. 中的所有数据时,它会创建变更日志主题
  3. 重新分区主题是否包含分组后的数据.
  4. 而且我看到 Changelog 和 topicname-parition 的大小大约相同.

数据有什么不同,因此必须为此保存不同的文件.

解决方案

'Changelog' 和 'repartition' 内部 Kafka 主题特定于 Kafka Streams.

来自 Kafka Wiki,

<块引用>

Kafka Streams 允许有状态的流处理,即具有内部状态的操作符.这种内部状态在所谓的状态存储中进行管理.状态存储可以是短暂的(失败时丢失)或容错的(失败后恢复).Kafka Streams DSL 使用的默认实现是一种容错状态存储,使用 1. 内部创建和压缩的变更日志主题(用于容错)和 2. 一个(或多个)RocksDB 实例(用于缓存的键值查找).因此,在启动/停止应用程序和倒带/重新处理的情况下,需要正确管理这些内部数据.

变更日志主题是在流上有加入/聚合操作时创建的.实际上,聚合调用的结果会创建一个状态存储,并且为了容错,状态存储由 Kafka Changelog 主题备份.

聚合结果存储在这个内部主题中.当应用程序重新启动且 application-id 未更改时,将从更改日志主题中恢复状态.

重新分区主题是在流上有关键修改操作时创建的.例如 groupByKey() 操作创建重新分区主题.查看 JIRA 页面以了解有关自动创建重新分区主题的更多信息.>

这两个内部主题使 Kafka 流具有容错状态的流处理能力.

重新分区主题是否包含分组后的数据? - 是

Changelog 和 topicname-parition 的大小大致相同 - 可能所有聚合操作的结果都存储在此主题中.

更多详情,请查看Kafka Wiki页面.

We are using kafka stream api for aggregation in which we are also using group by. We are also using state store where it saves the input topics data.

What i notice is

Kafka internally creates 3 kinds of topic

  1. Changelog-<storeid>-<partition>
  2. Repartition-<storeid>-<partition>
  3. <topicname>-<partition>

What I am not able to understand is

  1. Why it creates changelog topic when I have all the data in <topic>-<partition>
  2. Does repartition topic contains data after grouping.
  3. and I see that the size of Changelog and topicname-parition are approx same.

What is different in the data so that it has to save a different file for that.

解决方案

'Changelog' and 'repartition' internal Kafka topics are specific to Kafka Streams.

From Kafka Wiki,

Kafka Streams allows for stateful stream processing, i.e. operators that have an internal state. This internal state is managed in so-called state stores. A state store can be ephemeral (lost on failure) or fault-tolerant (restored after the failure). The default implementation used by Kafka Streams DSL is a fault-tolerant state store using 1. an internally created and compacted changelog topic (for fault-tolerance) and 2. one (or multiple) RocksDB instances (for cached key-value lookups). Thus, in case of starting/stopping applications and rewinding/reprocessing, this internal data needs to get managed correctly.

Changelog topics are created when there are join/aggregation operations on the stream. Actually the result of aggregation call creates a state store and for fault-tolerance the state store is backed up by a Kafka Changelog topic.

The aggregation results are stored into this internal topic. State will be recovered from changelog topic when applications is restarted and application-id wasn't changed.

Re-partition topics are created when there are key modifying operations on the stream. For example, groupByKey() operation creates repartition topic. Check JIRA page to know more about auto creation of re-parition topic.

These two internal topics enables Kafka streams to have fault-tolerant stateful stream processing capabilities.

Does repartition topic contains data after grouping? - Yes

The size of Changelog and topicname-parition are approx same - Possibly, the result of all aggregation operations are stored in this topic.

For more details, please check Kafka Wiki page.

这篇关于Kafka 使用了哪些内部主题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆