为什么 kafka 流线程在源主题分区更改时死亡?任何人都可以指出阅读材料吗? [英] Why does kafka streams threads die when the source topic partitions changes ? Can anyone point to reading material around this?

查看:23
本文介绍了为什么 kafka 流线程在源主题分区更改时死亡?任何人都可以指出阅读材料吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们增加了分区数量以并行处理消息,因为消息的吞吐量很高.一旦我们增加了分区的数量,订阅该主题的所有流线程就会死亡.我们更改了消费者组 ID,然后我们重新启动了它运行良好的应用程序.

We increased the number of partitions to parallel process the messages as the throughput of the message was high. As soon as we increased the number of partitions all the streams thread which were subscribed to that topic died. We changed the consumer group id then we restarted the application it worked fine.

我知道应用程序的更改日志主题的分区数应该与源主题相同.我想知道这背后的原因.

I know that the number of partitions changelog topic of application should be same as source topic. I would like to know the reason behind this.

我看到了这个链接 - https://issues.apache.org/jira/browse/KAFKA-6063?jql=project%20%3D%20KAFKA%20AND%20component%20%3D%20streams%20AND%20text%20~%20%22partition%22

I saw this link - https://issues.apache.org/jira/browse/KAFKA-6063?jql=project%20%3D%20KAFKA%20AND%20component%20%3D%20streams%20AND%20text%20~%20%22partition%22

找不到原因

https://github.com/apache/kafka/blob/fdc742b1ade420682911b3e336ae04827639cc04/streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java#L12>

https://github.com/apache/kafka/blob/fdc742b1ade420682911b3e336ae04827639cc04/streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java#L122

基本上,这背后的原因是 if 条件.

Basically, reason behind this if condition.

推荐答案

输入主题分区定义并行级别,如果您有有状态的操作,如聚合或连接,则这些操作在分片中的状态.如果您有 X 个输入主题分区,您将获得 X 个任务,每个任务都有一个状态分片.此外,状态由具有 X 个分区的 Kafka 中的变更日志主题支持,并且每个分片都使用这些分区中的一个.

Input topic partitions define the level of parallelism, and if you have stateful operations like aggregation or join, the state of those operations in sharded. If you have X input topic partitions you get X tasks each with one state shard. Furthermore, state is backed by a changelog topic in Kafka with X partitions and each shard is using exactly one of those partitions.

如果您将输入主题分区的数量更改为 X+1,Kafka Streams 会尝试使用 X 个存储分片创建 X+1 个任务,但是现有的更改日志主题只有 X 个分区.因此,您的应用程序的整个分区会中断,并且 Kafka Streams 无法保证正确处理,因此会因错误而关闭.

If you change the number of input topic partitions to X+1, Kafka Streams tries to create X+1 tasks with X store shards, however the exiting changelog topic has only X partitions. Thus, the whole partitioning of your application breaks and Kafka Streams cannot guaranteed correct processing and thus shuts down with an error.

另请注意,Kafka Streams 假设输入数据按键进行分区.如果您更改输入主题分区的数量,基于哈希的分区也会更改可能导致错误输出的内容.

Also note, that Kafka Streams assume, that input data is partitioned by key. If you change the number of input topic partitions, the hash-based partitioning changes what may result in incorrect output, too.

一般来说,建议在开始时对主题进行过度分区以避免此问题.如果确实需要横向扩展,最好使用新的分区数创建一个新主题,并并行启动应用程序的副本(具有新的应用程序 ID).之后,您更新上游生产者应用程序以写入新主题,最后关闭旧应用程序.

In general, it's recommended to over-partition topics in the beginning to avoid this issue. If you really need to scale out, it is best to create a new topic with the new number of partitions, and start a copy of the application (with new application ID) in parallel. Afterwards, you update your upstream producer applications to write into the new topic, and finally shutdown the old application.

这篇关于为什么 kafka 流线程在源主题分区更改时死亡?任何人都可以指出阅读材料吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆