当源主题分区更改时,为什么kafka流线程会消失?谁能指出与此有关的材料? [英] Why does kafka streams threads die when the source topic partitions changes ? Can anyone point to reading material around this?

查看:77
本文介绍了当源主题分区更改时,为什么kafka流线程会消失?谁能指出与此有关的材料?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于消息的吞吐量很高,因此增加了用于并行处理消息的分区数量.一旦我们增加了分区数量,订阅该主题的所有流线程就都消失了.我们更改了使用者组ID,然后重新启动了运行正常的应用程序.

We increased the number of partitions to parallel process the messages as the throughput of the message was high. As soon as we increased the number of partitions all the streams thread which were subscribed to that topic died. We changed the consumer group id then we restarted the application it worked fine.

我知道应用程序的分区changelog主题数应与源主题相同.我想知道这背后的原因.

I know that the number of partitions changelog topic of application should be same as source topic. I would like to know the reason behind this.

我看到了此链接-找不到原因

https://github.com/apache/kafka/blob/fdc742b1ade420682911b3e336ae04827639cc04/streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java#L122

基本上,这是if条件的原因.

Basically, reason behind this if condition.

推荐答案

输入主题分区定义并行度,如果您有状态操作(例如聚合或联接),则将这些操作的状态分片.如果您有X个输入主题分区,则每个X任务都带有一个状态分片.此外,状态由具有X分区的Kafka中的changelog主题支持,每个分片正好使用这些分区之一.

Input topic partitions define the level of parallelism, and if you have stateful operations like aggregation or join, the state of those operations in sharded. If you have X input topic partitions you get X tasks each with one state shard. Furthermore, state is backed by a changelog topic in Kafka with X partitions and each shard is using exactly one of those partitions.

如果将输入主题分区的数量更改为X + 1,Kafka Streams将尝试使用X存储分片创建X + 1任务,但是现有的changelog主题仅具有X分区.因此,您的应用程序的整个分区都会中断,并且Kafka Streams无法保证正确的处理,因此会因错误而关闭.

If you change the number of input topic partitions to X+1, Kafka Streams tries to create X+1 tasks with X store shards, however the exiting changelog topic has only X partitions. Thus, the whole partitioning of your application breaks and Kafka Streams cannot guaranteed correct processing and thus shuts down with an error.

还要注意,Kafka Streams假定输入数据按键分区.如果您更改输入主题分区的数量,则基于散列的分区也会更改可能导致错误输出的内容.

Also note, that Kafka Streams assume, that input data is partitioned by key. If you change the number of input topic partitions, the hash-based partitioning changes what may result in incorrect output, too.

通常,建议一开始就对主题进行过度分区,以避免出现此问题.如果确实需要横向扩展,则最好使用新的分区数创建一个新主题,并并行启动该应用程序的副本(具有新的应用程序ID).然后,您更新上游生产者应用程序以写入新主题,最后关闭旧应用程序.

In general, it's recommended to over-partition topics in the beginning to avoid this issue. If you really need to scale out, it is best to create a new topic with the new number of partitions, and start a copy of the application (with new application ID) in parallel. Afterwards, you update your upstream producer applications to write into the new topic, and finally shutdown the old application.

这篇关于当源主题分区更改时,为什么kafka流线程会消失?谁能指出与此有关的材料?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆