卡夫卡消费者CommitFailedException [英] Kafka Consumer CommitFailedException

查看：190 发布时间：2021/2/14 19:54:48 java apache-kafka kafka-consumer-api

本文介绍了卡夫卡消费者CommitFailedException的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在研究kafka消费者程序.最近，我们将其部署在PROD环境中.在那里，我们面临的问题如下:

I am working on a kafka consumer program. Recently we deployed it in PROD environment. There we faced an issue as follows:

[main] INFO com.cisco.kafka.consumer.RTRKafkaConsumer - No. of records fetched: 1
[kafka-coordinator-heartbeat-thread | otm-opl-group] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-otm-opl-group-1, groupId=otm-opl-group] Group coordinator opl-kafka-prd2-01:9092 (id: 2147483644 rack: null) is unavailable or invalid, will attempt rediscovery
[kafka-coordinator-heartbeat-thread | otm-opl-group] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-otm-opl-group-1, groupId=otm-opl-group] Discovered group coordinator opl-kafka-prd2-01:9092 (id: 2147483644 rack: null)
[kafka-coordinator-heartbeat-thread | otm-opl-group] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-otm-opl-group-1, groupId=otm-opl-group] Attempt to heartbeat failed for since member id consumer-otm-opl-group-1-953dfa46-9ced-472f-b24f-36d78c6b940b is not valid.
[main] INFO com.cisco.kafka.consumer.RTRKafkaConsumer - Batch start offset: 9329428
[main] INFO com.cisco.kafka.consumer.RTRKafkaConsumer - Batch Processing Successful.
[main] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-otm-opl-group-1, groupId=otm-opl-group] Failing OffsetCommit request since the consumer is not part of an active group
Exception in thread "main" org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1061)
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:936)
    at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1387)
    at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1349)
    at com.cisco.kafka.consumer.RTRKafkaConsumer.main(RTRKafkaConsumer.java:72)

我的理解是，当组协调器不可用并被重新发现时，心跳间隔(根据文档为3秒)到期，消费者被踢出了组.这样对吗?.如果是这样，应该如何解决?如果我错了，请帮助我理解此问题，并提出解决此问题的任何建议.如果需要，我可以共享代码.

My understanding is by the time group coordinator is unavailable and re-discovered, the heartbeat interval (3 seconds as per documentation) expires and consumer is kicked out of the group. Is this correct?. If so what should be the work around for this?. If I'm wrong, please help me in understanding this issue and suggest any ideas you have to fix this issue. I can share the code if needed.

1.在不关闭消费者的情况下打开越来越多的消费者

如果将消费者添加到现有的ConsumerGroup，则会发生重新平衡.因此，必须在使用后关闭使用者或始终使用同一实例，而不是为每个消息/迭代创建新的KafkaConsumer对象.

1. Opening more and more Consumers without closing them

A rebalance takes place if you add a consumer to an existing ConsumerGroup. Therefore, it is essential to close the consumer after usage or to always use the same instance instead of creating new KafkaConsumer object for every message/iteration.

[...]，随后对poll()的调用之间的时间比配置的max.poll.interval.ms长，这通常意味着轮询循环在消息处理上花费了太多时间.

[...] that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing.

配置 max.poll.interval.ms 默认为300000ms或5minutes.由于您的使用者花费了超过5分钟的时间，因此该使用者被认为是失败的，该组将重新平衡以将分区重新分配给另一个成员(请参阅

The configuration max.poll.interval.ms defaults to 300000ms or 5minutes. As your consumer is taking more than those 5 minutes, the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member (see Consumer Configuration).

错误消息中也给出了可能的解决方案

A possible solution is also given in the error message

您可以通过增加max.poll.interval.ms或通过使用max.poll.records减小poll()中返回的批处理的最大大小来解决此问题.

You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.

使用者再次读取所有消息，因为(如错误所示)它不能提交偏移量.这就是说，如果您使用相同的group.id来启动Consumer，它会认为它永远不会从该主题中读取任何内容.

The consumer reads all messages again, because (as the error shows) it is not able to commit the offsets. That means, if you start the Consumer with the same group.id it think that it never read anything from that topic.

KafkaConsumer中有两个主要配置可处理心跳:heartbeat.interval.ms和session.timeout.ms.

There are two main configuration in your KafkaConsumer that deal with the heartbeats: heartbeat.interval.ms and session.timeout.ms.

在单独的后台线程中，您的KafkaConsumer向服务器发送定期心跳.如果使用者在session.timeout.ms期间崩溃或无法发送心跳，则该使用者将被视为已死，并将重新分配其分区.如果触发了重新平衡，则您的消费者将无法从旧分配"商品中做出任何承诺.分区，如在CommitFailedException的描述中所写:在成功完成提交之前完成组重新平衡时，可能会发生这种情况."

In a seperate background thread your KafkaConsumer sends periodic heartbeats to the server. If the consumer crashes or is unable to send heartbeats for a duration of session.timeout.ms, then the consumer will be considered dead and its partitions will be reassigned. If the rebalance is triggered your consumer can't commit anything from an "old assigned" partition as it is written in the description of the CommitFailedException: "This can happen when a group rebalance completes before the commit could be successfully applied."

在遵循建议的同时增加设置heartbeat.interval.ms和session.timeout.ms: heartbeat.interval.ms必须设置为小于session.timeout.ms，但通常应设置为不大于该值的1/3."

Increase the settings heartbeat.interval.ms and session.timeout.ms while following the recommendation: " The heartbeat.interval.ms must be set lower than session.timeout.ms, but typically should be set no higher than 1/3 of that value."

请记住，更改这些值总是需要权衡取舍.你有

Just keep in mind that changing these values always comes with a trade-off. You have either

更频繁的重新平衡，但反应时间较短，以识别死者或
减少经常性的重新平衡并延长反应时间，以识别死者.

在生产集群上，在应用程序无法续订Kerberos票证之后，我们已经看到了CommitFailedException.

On our production cluster we have seen the CommitFailedException just after the application was not able to renew the Kerberos ticket.

这篇关于卡夫卡消费者CommitFailedException的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

卡夫卡消费者CommitFailedException [英] Kafka Consumer CommitFailedException

问题描述

推荐答案

1.在不关闭消费者的情况下打开越来越多的消费者

1. Opening more and more Consumers without closing them

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

卡夫卡消费者CommitFailedException [英] Kafka Consumer CommitFailedException

问题描述

推荐答案

1.在不关闭消费者的情况下打开越来越多的消费者

1. Opening more and more Consumers without closing them

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭