卡夫卡分区中的邮件分布不均 [英] Uneven Distribution of messages in Kafka Partitions

查看:70
本文介绍了卡夫卡分区中的邮件分布不均的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个主题,该主题包含10个分区,1个消费者组和4个消费者,并且工作人员人数为3.

I have a topic with 10 partitions, 1 consumer group with 4 consumers and worker size is 3.

我可以看到分区中的消息分布不均,一个分区中的数据太多,而另一个分区是免费的.

I could see there is an uneven distribution of messages in the partitions, One partition is having so much data and another one is free.

如何使我的生产者将负载平均分配到所有分区中,以便正确使用所有分区?

How can I make my producer to evenly distribute the load into all the partitions, so that all partitions are being utilized properly?

推荐答案

根据DefaultPartitioner类本身的JavaDoc注释,默认分区策略为:

According to the JavaDoc comment in the DefaultPartitioner class itself, the default partitioning strategy is:

  • 如果在记录中指定了分区,请使用它.
  • 如果未指定分区但存在密钥,则根据密钥的哈希值选择一个分区.
  • 如果不存在分区或密钥,请以循环方式选择一个分区.

https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java

因此,有两个可能导致分布不均的可能原因,具体取决于您是否在生成消息时指定了密钥:

So here are two possible reasons that may be causing the uneven distribution, depending on whether you are specifying a key while producing the message or not:

  • 如果您指定键,并且使用DefaultPartitioner获得不均匀的分布,则最明显的解释是您多次指定了相同的键.

  • If you are specifying a key and you are getting an uneven distribution using the DefaultPartitioner, the most apparent explanation would be that you are specifying the same key multiple times.

如果您未指定键并使用DefaultPartitioner,则可能会发生非显而易见的行为.根据上述内容,您可以期望消息的循环分发,但是不一定是这种情况.0.8.0中引入的优化可能导致使用相同的分区.检查此链接以获取更详细的说明: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-为什么未指定分区键时,为什么分区中的数据未均匀分配?.

If you are not specifying a key and using the DefaultPartitioner, a non-obvious behavior could be happening. According to the above you would expect round-robin distribution of messages, but this is not necessarily the case. An optimization introduced in 0.8.0 could be causing the same partition to be used. Check this link for a more detailed explanation: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified? .

这篇关于卡夫卡分区中的邮件分布不均的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆