Kafka:多个实例中的单个消费者组 [英] Kafka: Single consumer group in multiple instances

查看:47
本文介绍了Kafka:多个实例中的单个消费者组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为我们的应用程序实施基于 Kafka 的解决方案.根据 Kafka 文档,我的理解是消费者组中的一个消费者(这是一个线程)在内部映射到订阅主题中的一个分区.

I am working on implementing a Kafka based solution to our application. As per the Kafka documentation, what i understand is one consumer in a consumer group (which is a thread) is internally mapped to one partition in the subscribed topic.

假设我有一个包含 40 个分区的主题,并且我有一个在 4 个实例中运行的高级使用者.我不希望一个实例消耗另一个实例消耗的相同消息.但是如果一个实例宕机,其他三个实例应该能够处理所有消息.

Let's say i have a topic with 40 partitions and i have a high level consumer running in 4 instances. I do not want one instance to consume the same messages consumed by another instance. But if one instance goes down, the other three instances should be able to process all the messages.

  • 我应该使用每个实例 10 个线程的同一个消费者组吗?- Stackoverflow 说实例之间的相同消费者组充当传统的同步队列机制

在 Apache Kafka 中为什么可以消费者实例比分区多吗?

  • 或者我应该为每个实例选择不同的消费者群体吗?

使用简单的消费者或低级消费者可以控制分区,但是如果一个实例出现故障,其他三个实例将不会处理来自第一个实例消耗的分区的消息

Using simple consumer or low level consumer gives control over the partition but then if one instance goes down, the other three instances would not process the messages from the partitions consumed in first instance

推荐答案

先解释一下Consumers &的概念消费者群体,

First to explain the concept of Consumers & Consumer Groups,

消费者用消费者组名称标记自己,发布到主题的每条记录都被传递到每个订阅消费者组中的一个消费者实例.

Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group.

记录将在消费者组中的消费者实例上有效地进行负载平衡.如果所有的消费者实例都有不同的消费者组,那么每条记录都会广播给所有的消费者进程.

The records will be effectively load balanced over the consumer instances in a consumer group. If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.

Kafka 实现消费的方式是将日志中的分区划分到消费者实例上,以便每个实例在任何时间点都是分区公平份额"的独占消费者.如果新的实例加入该组,它们将从该组的其他成员那里接管一些分区;如果一个实例死亡,它的分区将分配给剩余的实例.

The way consumption is implemented in Kafka is by dividing up the partitions in the log over the consumer instances so that each instance is the exclusive consumer of a "fair share" of partitions at any point in time. If new instances join the group they will take over some partitions from other members of the group; if an instance dies, its partitions will be distributed to the remaining instances.

现在回答您的问题,

1.我不希望一个实例消耗另一个实例消耗的相同消息.但如果一个实例出现故障,其他三个实例应该能够处理所有消息.

默认情况下,这在 Kafka 架构中是可能的.您只需使用相同的消费者组名称标记所有 4 个实例.

This is possible by default in Kafka architecture. You just have to label all the 4 instances with the same consumer group name.

2.我应该选择每个实例 10 个线程的同一个消费者组吗?

这样做会为每个线程分配一个 kafka 分区,它将从中消费数据,这是最佳的.减少线程数将平衡消费者实例之间的记录分布,并可能使部分消费者实例过载.

Doing this will assign each thread a kafka partition from which it will consume data, which is optimal. Reducing the number of threads will load balance the record distribution among the consumer instances and MAY overload some of the consumer instances.

3.在 Apache Kafka 中,为什么消费者实例不能多于分区?

在Kafka中,一个分区只能分配给一个消费者实例.因此,创建比分区更多的消费者实例将导致空闲消费者不会从 kafka 消费任何记录.

In Kafka, a partition can be assigned only to one consumer instance. Thus, creating more consumer instances than partitions will lead to idle consumers who will not be consuming any records from kafka.

4.我应该为每个实例选择不同的消费者群体吗?

没有.这将导致记录重复,因为每条记录都将发送到所有实例,因为它们来自不同的消费者群体.

No. This will lead to duplication of the records, as every record will be sent to all the instances, as they are from different consumer groups.

希望这能澄清您的疑虑.

Hope this clarifies your doubts.

这篇关于Kafka:多个实例中的单个消费者组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆