Kafka 消费者再平衡算法 [英] Kafka Consumer Rebalancing Algorithm

查看:35
本文介绍了Kafka 消费者再平衡算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能告诉我 Kafka 消费者的重新平衡算法是什么?我想了解分区计数和消费者线程对此有何影响.

Can someone please tell me what the rebalancing algorithm is for Kafka consumers? I would like to understand how partition count and consumer threads affect this.

谢谢,

推荐答案

好的,目前有 2 种重新平衡算法 - RangeRoundRobin.它们也称为分区分配策略.

Ok so there are 2 rebalancing algorithms at the moment - Range and RoundRobin. They are also called Partition Assignment Strategies.

为简单起见,假设我们有一个具有 10 个分区的主题 T1,并且我们还有 2 个具有不同配置的使用者(为了让示例更清晰)- C1 带有 num.streams 设置为 1C2num.streams 设置为 2.

For the simplicity assume we have a topic T1 with 10 partitions and we also have 2 consumers with different configurations (for the example to be clearer) - C1 with num.streams set to 1 and C2 with num.streams set to 2.

以下是如何使用 Range 策略:

Here's how that would work with Range strategy:

Range 按数字顺序排列可用分区,按字典顺序排列消费者线程.所以在我们的例子中,分区的顺序是 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 而消费者线程的顺序是 C1-0,C2-0,C2-1.然后分区的数量除以消费者线程的数量,以确定每个消费者线程应该拥有多少个分区.在我们的例子中,它不会平均分配,因此线程 C1-0 将获得一个额外的分区.最终的分区分配如下所示:

Range lays out available partitions in numeric order and consumer threads in lexicographic order. So in our case the order of partitions will be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and order of consumer threads will be C1-0, C2-0, C2-1. Then the number of partitions is divided by the number of consumer threads to determine how many partitions each consumer thread should own. In our case it doesn't divide equally, so the thread C1-0 will get one extra partition. The final partition assignment would look like this:

C1-0 获取分区 0, 1, 2, 3
C2-0 获取分区 4, 5, 6
C2-1 获取分区 7, 8, 9

C1-0 gets partitions 0, 1, 2, 3
C2-0 gets partitions 4, 5, 6
C2-1 gets partitions 7, 8, 9

如果有 11 个分区,这些消费者的分区分配会有所改变:

If there would be 11 partitions the partition assignment for these consumers would change a bit:

C1-0 会得到分区 0, 1, 2, 3
C2-0 会得到分区 4, 5, 6, 7
C2-1 会得到分区 8, 9, 10

C1-0 would get partitions 0, 1, 2, 3
C2-0 would get partitions 4, 5, 6, 7
C2-1 would get partitions 8, 9, 10

就是这样.

相同的配置不适用于 RoundRobin 策略,因为它要求订阅此主题的所有消费者具有相同的 num.streams,因此假设两个消费者都有 num.streams 现在设置为 2.与此处的 Range 策略相比,一个主要区别是您无法预测重新平衡之前的分配情况.下面是如何使用 RoundRobin 策略:

The same configuration wouldn't work for RoundRobin strategy as it requires equal num.streams across all consumers subscribed for this topic, so lets assume both consumers have num.streams set to 2 now. One major difference compared to Range strategy here is that you cannot predict what the assignment will be prior to rebalance. Here's how that would work with RoundRobin strategy:

首先,在实际赋值之前必须满足两个条件:

First, there are 2 conditions that MUST be satisfied before actual assignment:

a) 每个主题在一个消费者实例中都有相同数量的流(这就是我上面提到每个消费者不同数量的线程不起作用的原因)
b) 对于组内的每个消费者实例,订阅的主题集都是相同的(我们这里有一个主题,所以现在不是问题).

a) Every topic has the same number of streams within a consumer instance (that's why I mentioned above that different number of threads per consumer will not work)
b) The set of subscribed topics is identical for every consumer instance within the group (we have one topic here so that's not a problem now).

当这 2 个条件得到验证时,topic-partition 对按哈希码排序,以减少将一个主题的所有分区分配给一个消费者的可能性(如果有多个主题要被消耗).

When these 2 conditions are verified the topic-partition pairs are sorted by hashcode to reduce the possibility of all partitions of one topic to be assigned to one consumer (if there is more than one topic to be consumed).

最后,所有 topic-partition 对都以循环方式分配给可用的消费者线程.例如,如果我们的主题分区最终会这样排序: T1-5, T1-3, T1-0, T1-8, T1-2, T1-1, T1-4, T1-7, T1-6, T1-9 和消费者线程是 C1-0, C1-1, C2-0, C2-1 那么赋值会是这样的:

And finally, all topic-partition pairs are assigned in a round-robin fashion to available consumer threads. For example if our topic-partitions will end up sorted like this: T1-5, T1-3, T1-0, T1-8, T1-2, T1-1, T1-4, T1-7, T1-6, T1-9 and consumer threads are C1-0, C1-1, C2-0, C2-1 then the assignment will be like this:

T1-5 转到 C1-0
T1-3 转到 C1-1
T1-0 转到 C2-0
T1-8 转到 C2-1
此时没有更多的消费者线程剩下,但还有更多的主题分区,所以消费者线程的迭代重新开始:
T1-2 转到 C1-0
T1-1 转到 C1-1
T1-4 转到 C2-0
T1-7 转到 C2-1
再说一遍:
T1-6 转到 C1-0
T1-9 转到 C1-1

T1-5 goes to C1-0
T1-3 goes to C1-1
T1-0 goes to C2-0
T1-8 goes to C2-1
At this point no more consumer threads are left, but there are still more topic-partitions, so iteration over consumer threads starts over:
T1-2 goes to C1-0
T1-1 goes to C1-1
T1-4 goes to C2-0
T1-7 goes to C2-1
And again:
T1-6 goes to C1-0
T1-9 goes to C1-1

此时所有主题分区都已分配,每个消费者线程都有几乎相等数量的分区.

At this point all topic-partitions are assigned and each consumer thread has near-equal number of partitions each.

希望这会有所帮助.

这篇关于Kafka 消费者再平衡算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆