在 Kafka 中重新平衡主题分区的成本 [英] Cost of Rebalancing partitions of a topic in Kafka

查看:23
本文介绍了在 Kafka 中重新平衡主题分区的成本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试设计一个从 Kafka 消费的设计.我正在使用 0.8.1.1 版本的 Kafka.我正在考虑设计一个系统,每隔几秒钟创建一个消费者,从 Kafka 消费数据,处理它,然后在将偏移量提交给 Kafka 后退出.在任何时候,预计有 250 - 300 个消费者处于活动状态(在不同机器上作为线程池运行).

I am trying to come up with a design for consuming from Kafka. I am using 0.8.1.1 version of Kafka. I am thinking of designing a system where the consumer will be created every few seconds, consume the data from Kafka, process it and then quits after committing the offsets to Kafka. At any point of time expect 250 - 300 consumers to be active (running as ThreadPools in different machines).

  1. 分区的重新平衡如何以及何时发生?

  1. How and When a rebalance of partitions happens?

在消费者之间重新平衡分区的成本有多大.我期待每隔几秒钟就有一个新的消费者完成或加入同一个消费者组.所以我只想知道重新平衡操作的开销和延迟.

How costly is the rebalancing of partitions among the consumers. I am expecting a new consumer finishing up or joining every few seconds to the same consumer group. So I just want to know the overhead and latency of a rebalancing operation.

假设消费者 C1 分配了分区 P1、P2、P3,它正在处理来自分区 P1 的消息 M1.现在消费者 C2 加入该组.C1 和 C2 之间的分区是如何划分的.是否有可能 C1(可能需要一些时间将其消息提交给 Kafka)提交 M1 将被拒绝,并且 M1 将被视为一条新消息并将被交付给其他人(我知道 Kafka 至少是一次交付模型,但想确认重新分区是否会导致相同消息的重新传递)?

Say Consumer C1 has Partitions P1, P2, P3 assigned to it and it is processing a message M1 from Partition P1. Now Consumer C2 joins the group. How is the partitions divided between C1 and C2. Is there a possibility where C1's (which might take some time to commit its message to Kafka) commit for M1 will be rejected and M1 will be treated as a fresh message and will be delivered to someone else (I know Kafka is at least once delivery model but wanted to confirm if the re partition by any chance cause a re delivery of the same message)?

推荐答案

如果我是你,我会重新考虑设计.也许您需要一个消费者池?

I'd rethink the design if I were you. Perhaps you need a consumer pool?

  1. 每次消费者加入或离开组时都会重新平衡.

  1. Rebalancing happens every time a consumer joins or leaves the group.

Kafka 和当前的消费者绝对是为长期运行的消费者设计的.新的消费者设计(计划为 0.9)将更好地处理短期消费者.根据我的经验,重新平衡需要 100-500 毫秒,这在很大程度上取决于 ZooKeeper 的工作方式.

Kafka and the current consumer were definitely designed for long running consumers. The new consumer design (planned for 0.9) will handle short-lived consumers better. Re-balances takes 100-500ms in my experience, depending a lot on how ZooKeeper is doing.

是的,在重新平衡期间经常发生重复.这就是为什么我们试图避免它们.您可以尝试通过更频繁地提交偏移量来解决这个问题,但是如果有 300 个消费者频繁提交并且许多消费者加入和离开 - 您的 Zookeeper 可能会成为瓶颈.

Yes, duplicates happen often during rebalancing. Thats why we try to avoid them. You can try to work around that by committing offsets more frequently, but with 300 consumers committing frequently and a lot of consumers joining and leaving - your Zookeeper may become a bottleneck.

这篇关于在 Kafka 中重新平衡主题分区的成本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆