调试不平衡的卡夫卡message_in率 [英] debugging imbalanced kafka message_in rate
问题描述
我在生产中有一个4节点kafka集群,我们在其中使用自定义分区程序,该程序执行id的mod 64来确定分区.自上周以来,我们的1个节点上的kafka messages_in速率一直处于不平衡状态,如所附图表所示.粉色线显示kafka01节点上的速率消息,蓝黄色线显示所有其他3个框上的速率消息.我正在使用datadog监视和使用度量kafka.messages_in.rate.假定id分布没有变化,则消息的速率分布也应该没有变化.我已采取步骤来调试该问题,
I've a 4 node kafka cluster in my production where we are using custom partitioner which does mod 64 of an id to determine the partition. since last week, there has been imbalanced kafka messages_in rate on 1 of our nodes as can been seen in the graph attached. The pink line shows the message in rate on kafka01 node and bluish yellow line shows the message in rate on all other 3 boxes . I'm using datadog for monitoring and using the metric kafka.messages_in.rate . Assuming that there has been no change in the id distribution , there should have been no change in distribution of message in rate . Steps I've taken to debug the issue are
- 集群在4个节点中的每个节点上均具有16个领导者.
- ISR在4个方框中也保持平衡,每个方框具有32个ISR [复制因子2]
- 所有4个框上的网络输入和输出几乎相等.
请求任何可以帮助调试此异常的帮助或领域/指标.
Requesting any help or areas/metrics one can look into to debug this anomaly.
For people who are searching about this in future https://mail-archives.apache.org/mod_mbox/kafka-users/201710.mbox/%3CCALaekbwkSKapqPwsyuAoHGiSnc1+3jF2wF+2FDZbAVx61E+c2w@mail.gmail.com%3E
推荐答案
需要调试的东西
- 启用代理日志的跟踪
- 比较收到一个请求的日志和在较短时间内收到较少请求的日志,这些日志将有大量的产生请求进行分析以进行比较
- 在日志中搜索ProducerRequest,它将为您提供深入了解分区是否按预期进行的信息,并提供有关它从哪个主机接收更多请求的信息.
这篇关于调试不平衡的卡夫卡message_in率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!